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ON A POPULATION SAMPLE FOR GREECE 
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Samples of households and of names from electoral lists were 
compared in connection with the work of the Allied Mission 
for Observing the Greek Elections. Only the sample of house- 
holds taken in the summer of 1946 is described in detail here. 
The 3052 sample households were drawn from a primary 
sample of about 200 areas. Villages, towns, and cities were 
classified on the basis of their 1940 populations, and a sample 
of areas and households was drawn, conforming closely to the 
principle of optimum allocation of resources and manpower. 
Geographic distribution was obtained automatically by draw- 
ing the areas at systematic intervals from the 1940 Census 
lists, which were arranged in geographic and alphabetic order. 
The sample gave an estimate of .988 for the ratio of the 1946 
to the 1940 population, with a standard error of 2.1 percent. 
This ratio corresponds to a 1946 population of 7.26+.15 mil- 
lion. Numerous other population characteristics such as dis- 
tribution by sex and age, literacy, and employment, and in- 
dustry, were estimated and the standard errors were computed 
for some of them. The entire sample and the computations 
were completed in seven weeks. 


N THE summer of 1946 a combined mission of American and British 
| observers was sent to Greece by their governments to examine the 
electoral lists which were being prepared for a plebiscite on the issue of 
whether George II was to be retained as King of the Hellenes. To be 
exact, this was the second mission, and was designated as AMFOGE 
II. An earlier mission was sent to'Greece for observations on the elec- 
tion of March 31, 1946, and a report was issued by the Department 
of State, 1946: publication 2522. Both missions were formed to carry 
out the provisions of the Varkiza agreement. This agreement, signed 
February 12, 1945, a day after publication of the Yalta Declaration, 
required that Greece would hold both parliamentary elections and a 
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plebiscite, and stated that “representatives of both sides agree that 
for the verification of the genuineness of the popular will, the great 
allied powers shall be requested to send observers.” 

In carrying out its observations the mission made a number of sta- 
tistical surveys to obtain not only first hand information on registra- 
tion but also considerable background information on general popula- 
tion characteristics. It is this particular aspect of the mission’s statis- 
tical activities that will be presented here. 


THE PURPOSES OF THE SAMPLES 

The problem of investigating the electoral lists. Any electoral list can 
fail in two ways: 

(a) It may contain surplus names—that is, names of men who have 
died prior to a certain date and should either not be on the list at all or 
should be subtracted out by entry on a subsequent negative register; fic- 
titious names; multiple registrations; names of men who have lost their right 
to vote by conviction of certain felonies, desertion from the army, etc. 

(b) It may fail to contain names that should be on it. 

The two types of errors are independent; i.e., a list may be entirely 
free of the first type, yet not of the second, containing no surplus names 
but failing to contain the names of some citizens who are entitled 
to register. Likewise it may be free of the second type, yet not of the 
first. The purpose of the mission was to determine the extent and varia- 
bility of each type of error, and the reasons therefor, and to decide 
whether the electoral rolls could safely be used in a national] referen- 
dum. 

The sample of names from the rolls. In order to determine the extent 
of the first type of error, samples of names were drawn with known 
probabilities from all the electoral lists in Greece, positive and nega- 
tive, basic and supplemental.! Every name in the sample was investi- 
gated to see whether it was a valid entry. This investigation included 
not only information from the man himself (when possible) and his 
neighbors, but also a further examination of the lists to determine 
whether his positive (or negative) entry was annulled, properly or 
improperly, by a negative (or positive) entry. The sample of names was 
not valid as a household sample; it was, as intended, a sample of names 
from the rolls. This sample will therefore not be mentioned again here 
except in brief. 

The household sample. This is the sample that forms the subject of 

1 For an explanation of the nature of the Greek electoral lists see Jessen, Kempthorne, Daly, and 


Deming, “A chapter in quantitative political science,” submitted to The Journal of Political Science, 
1947. 
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this article. It was taken primarily to investigate the second type of 
error in the rolls. It was a sample of households drawn with known 
probability from all of Greece and it could thus be used for obtaining 
population data as well as for measuring the extent of the second type 
of error. One very important population characteristic in the investiga- 
tion of the electoral lists is the total number of male Greek citizens 
21 and over, which determines an upper limit to the potential number 
of electors. 

The population sampled includes those inhabitants who are reported 
as belonging to households. A household was defined as one person liv- 


‘ing separately or a group of people living together as a unit (in general, 


a family). Observers were required to include as members of households 
“all persons associated with it who will not be included as members of 
some other household as defined here.” People temporarily absent from 
a household, such as members of the armed forces, were to be included. 
It should be noted that people who would not come into this enquiry 
cannot in practice be taken into account for election purposes, no 
matter how perfect the electoral procedure. In the opinion of the authors 
the total number of people thus excluded will not materially affect the 
estimates given here. 

The form used for eliciting the information from the households 
contained spaces for sex, age, literacy, employment, class of worker, 
and occupation, aside from the spaces required for information con- 
cerning registration. The form used is shown at the close. The informa- 
tion obtained from the household concerning the registration, citizen- 
ship, and residence of every male 21 and over was checked against the 
local electoral rolls, positive and negative, to determine whether this 
information was correct—more precisely, to determine the extent of 
the second type of error (Type b as described above). 

This paper presents the method of selecting the household sample, 
the method of estimating the various population characteristics, the 
calculation of the sampling errors, and the results on some of the popu- 
lation characteristics of Greece. 


THE HOUSEHOLD SAMPLE 


The sampling plan in brief. The household sample was so designed 
that every household had a known probability of being drawn into 
the sample. It was moreover drawn in such manner that the bias of se- 
lection was eliminated, and the standard errors computable. These 
features were considered absolutely necessary because of the impor- 
tance of the decisions to be based on the data. Some of the towns and 
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villages selected for the sample were accessible only with difficulty, but 
no substitutions of either village or household were permitted. Oft- 
times a village could be reached only after a day’s journey by jeep, 
supplemented by transportation by burro or on foot. Boats furnished 
transportation to the islands and to coastal towns inaccessible other- 
wise because of swamp or mountains. Aeroplanes were used occasion- 
ally for dispatching observers and for supervision of the work. 

The field work required the services of 65 observer-teams for 3 weeks. 
Each observer-team consisted of an observer, an interpreter, a driver, 
and the indispensable jeep. The British observers were officers in His 
Majesty’s and Dominions’ Forces; the American observers were civil- 
ians (mostly instructors and graduate students in statistics, govern- 
ment, political science, and economics) dispatched to Greece for that 
purpose. 

In regard to the time required for the entire job, it should be noted 
that the first contingent of the sampling staff arrived in Athens about 
June 23, and the report of the Mission was made to the press August 
19—a total elapsed time of about 8 weeks. 

On the basis of estimates of the variability of certain important 
characteristics of the population from one area to another, and in the 
light of information obtained on a similar mission in the spring of 
1946, a sample of the following description was indicated: about 200 
cities, towns and villages drawn from strata suitably created on the 
basis of geographical divisions and the population in 1940, and includ- 
ing 1/500th of the population of Greece. 

The primary sampling unit, which we shall refer to as “place,” was 
a village, town or city, along with its satellite villages and any house- 
holds in the open country (rare in Greece) within a carefully delim?ted 
area whose combined population appeared in the 1940 Greek census 
as a single entry. Several stages of sub-sampling were usually used 
within a primary area, the sampling unit being a group of about 15 
households and the ultimate unit the household.? The sampling ratios 
in the various stages were adinsted so that the over-all sampling ratio 
in every siratum was uniformly 1/500. Thus the sample was self- 
weighting. This self-weighting feature was extremely important be- 
cause of the simplicity introduced into the tabulations which were 
necessarily carried out by hand under great pressure for speed. For 
obtaining the total population of Greece, a regression estimate was 
used. More explicitly, the sample was used to obtain estimates of (7) 
the change in population since 1940; and (iz) the proportions of the 


2 See Schemes 2a and 2b further on; also the appendixes. 


















































CIATION 


ty, but 
ed. Oft- 
oy jeep, 
imnished 
e other- 
ccasion- 


} weeks, 
_ driver, 
}in His 
re civil- 
govern- 
or that 


e noted 
s about 
August 


portant 
| in the 
ring of 
put 200 
on the 
includ- 


2,” was 
house- 
lim3+ed 
census 
y used 
out 15 
. Yativus 
t'4 ratio 
is self- 
int be- 
h were 


d. For 


te was 
3 of (7) 
of the 











ON A POPULATION SAMPLE FOR GREECE 361 


population in various subclasses. The mathematical expressions for 
the regression estimates and their standard errors will be given further on. 

The sample of villages, towns, and cities. The basis for the sample de- 
sign was the 1940 population census of Greece, which, although out of 
date in many respects, nevertheless contained information valuable 
for controlling and diminishing the errors of sampling. The census pub- 
lication distinguishes two types of area, the first, the kotnotis (plural, 
koinotetes) which is a small community or village; and second, the 
demos (plural, demoz), which is usually a town or city with more than 
10,000 population, or else the capital of a nomos (plural, nomot), which 
represents a province. The census of 1940 gave the population of each 
of the 5690 koiontetes and demoi covering the whole of Greece. Many 
of the koinotetes and demoi include more than one community or popu- 
lation group. The census publication showed the name of only the main 
community in an area, but a list and a map were obtained from the 
Ministry of Interior which showed the names and locations of all the 
populated centers within each koinotis and demos. These smaller vil- 
lages and towns were termed “satellites” and for purposes of sampling 
were regarded as parts of the listed koinotis or demos. Thus every part 
of Greece had a chance to be included in the sample. 

At the time of the 1940 census Greece was divided into 38 nomot. 
These in turn were divided into eparchies, and the eparchies into 
koinotetes and demot. Thus, every square foot of Greece is in one or 
another koinotis or demos as listed in 1940. The Dodecanese Islands 
were not included. 

The first step in the sampling plan was to subdivide the koinotetes 
into four classes and the demoi into two, according to their 1940 popu- 
lations. Table 1 shows the plan of classification and the sampling ratios 
that were used for selecting the koinotetes and demoi, and for selecting 
the sample households from within them. Attention is called to the fact 
that the product of aiuiy ot the two sampling ratios standing side by side 
in the fourth and fifth columns is always 1/500. Had straight multi- 
plication by 500 been intended, instead of a regression estimate, prob- 
ably 12 or 15 size-classes of community would have been used, instead 
of six. 

The sample koinotetes in Class 1 were selected by starting with a 
random number between 1 and 100, such as 37, and marking off in the 
census publication the 37th, 137th, 237th, ... koinotis of the size 
specified (0-499). The sample koinotetes and demoi in the other classes 
were drawn in a similar manner. All 22 cities having 25,000 or more 
inhabitants in 1940 were in the sample. 


, 
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It was necessary to consider in the over-ali sample design the needs 
of both aspects of the mission’s work, that is to investigate the two 
kinds of errors in the electoral lists (vide supra). In order that a sample 
design may be made as efficient as possible, it is essential that the de- 
signer know, or have estimates of, the nature and amount of variability 
existing in the universe under consideration, and the costs, in terms of 
time and equipment, of performing the various possible sampling op- 
erations. Although conditions in Greece are certainly different in many 
respects from those in the U. S., it was possible to use experiences ac- 
quired in the U. S. to provide useful answers to a number of those 
questions, supplemented by experience gained from the first mission 
and from trial runs. The principal items of importance from the stand- 
point of design are listed below. 


1) Travel: The average distance that could be covered to or from a 
randomly-seiected sample place in Greece by jeep, burro, or boat, was esti- 
mated at, 75 miles per day. 

2) Drawing the sample of households. The average time required to make 
lists or maps and to designate the sample houscholds thereon according to 
the prescribed rules was estimated to be one sample place per day. 

3) Interviewing. The rate of interviewing was estimated at 15 sample 
households per day. 

4) Locating and examining the electoral lists. The observer was required 
to find the local electoral! lists (which sometimes meant traveling to another 
city or town) and to draw off a sample of names according to prescribed 
rules, along with certain identifying information, so that he could investi- 
gate errors of Type a (next step). He was also required to examine the in- 
formation appearing en the electoral lists concerning the men previously 
interviewed is the household sample, to investigate errors of Type b. It was 
estimated that the observer would require a whole day to do this work. 

5) Investigating the registranis. It was estimated that the investigation of 
the errors of Type a, which required interviewing the registrants drawn from 
the electoral jiste in Step 4, and making enquiries of neighbors, would proceed 
at an average rate of 20 names per day. 


With these advance estimates of cost in mind, it was possible to 
draw up a sampling plan that would produce the reliability required 
and which at the same time would approach the optimum use of re- 
sources. In consideration of the above costs of operation, the number 
of observers (65), and the period of 3 weeks allotted to the field-work, 
it was decided that about 15 sample households or one average day’s 
interviewing in one sample place (ineluding satellite villages), and 
abozit 200 sample places, would constitute a good workable plan. Each 
observer would then have 3 sample places to complete in the 3 weeks, 
or one sample place per week, the time being allocated as follows: 
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1) travel to and from the sample place. ...............-.2e08% 2 days 
Sy epeeneieaes Tas HONE CIID sao a on 5k sch eScd vec secces 1 day 
3) interviewing the sample households...................00055 1 day 
4) locating and using the electoral lists. .................. 1 day 
5) investigating names drawn from electoral lists............... 1 day 
| ES OE ee oe rE re ee ae ee 6 days 


The decision was made, on the above considerations to adopt as a 


‘ 
; 


sampling unit, an “expected” 15 househoi'’s within a sample area, 
this number being based on the 1940 Cens:s cf “Sm ore. For example, in 
a village of 750 inhabitants in 1946 {150 ems . with 5 people per 
household), a sampling ratio of 7 ia } wid reset in an “expected” 
15 households {sew sime-elass Yow, Tabve 4). ¥ or niaeres in this class with 
populations siver : © TM the Sownreeter’” fiowseholds will of course, 
differ fram: §5. {i wes asve Deer jaowsibie to make use of the 1940 
populztion dete toy each: pleae aepeerately (nstead of by size class) to 
determine its o2rtxwier wit." ieer smearing ratio in order to obtain 
aconstnn! sies of saeigtiagr anit * ¢ ‘ie would have required too 
mucis (ia tor the seentoan hoe. Seougimme places, and moreover, under 
the creromstances wanta, Nyrotio @irahersome in the field. Such 
a plan offers Wretes eeeus set, *je Ope vised if the method of esti- 
mating (otis te the semate cine 7 sa0’r plying sample totals by the 
reciprecal of cee overall aemyting fraction, Sat it was believed that 
the joss of Informe. sor vesultins foam, ce. varetion in size of sampling 
unit (caused. hy varying size of voliee: wii tanstly be recovered in 
the plan that wan doewed wherein a cegression estimate was used 
(mide infra: 

A sere of 23) areae 034i | »easeholds per area contains a total 
of 360% souseholds. Gu the gaeirertion of a total population of 7} 
milion .cople, or 14 milion isccsehoids (5 people per househo!d), the 
overs. sumpling ratio ie %.4s <ivided by 14 million, or 1 in 500. It 
ves cec:ied to keep the «asvpting ratio constant over the whole of 
Greece, me gardless uf size of community. A preponderant justification 
for fhe oo astat? retio wes simplicity of tabulation, enforced by the 
narroy sii Hew ties Yequirements. 


The actuai procedure for drawing the sample of villages, towns, and 
cities has been described earlier. This was merely the means of put- 
ting into effect certain theoretical considerations of sample allocation 
which will now be described. The procedure of drawing the sample 

eas can be looked upon as creating, within each size class, geographic- 


M.H. Hansen, and W. N. Hurwitz, “On the theory of sampling from finite populations,” Anr 
of Mathematical Statistics, Vol. xiv, 1943, pp. 333-362. 








































364 AMERICAN STATISTICAL ASSOCIATION 


alphabetic strata of 500 sampling units, 15 households to the sampling 
unit as measured in 1940-—geographic-alphabetic because the demoi 
and koinotetes are listed alphabetically by nomoz. In size class I each 
stratum consists of 100 places, each of 5 sampling units, in size class 
2, each stratum consists of 50 places each of 10 sampling units, and so 
on (cf. Table I). 

One sampling unit (15 households) was drawn per stratum. The 
constant sampling ratio of 1 in 500, sufficiently justified on the grounds 
of simplicity as mentioned above, happens to accord fairly well with the 
principle of optimum allocation‘ according to which 

Ns Oi 
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where k is a proportionality constant, n; is the number of sampling 
units drawn from Stratum 7 in the sample (one in this problem), N; 
is the number of sampling units originally there (500), c; is the cost of 
carrying out the sampling per sampling unit and o; is the standard 
deviation of the desired characteristic defined by the equation 


ot = —— 3D (ay — 40, (2) 
4Ve Oj 

wherein 2;; represents the value of the desired characteristic in sam- 
pling unit 7, and Z; represents the average vaine of this characteristic 
in all N; sampling units of Stratum 7. As a great many characteristics 
were to be measured (population change since 194%, number of Greek 
male citizens 21 and over, number registered, and many others) and 
as the strata created were of approximately eaual size, it could only be 
assumed for allocation pv~poses that o; was the same for all strata. It 
has already been explained that c; is about the same in city and village; 
hence the right-hand member is practically constant and demands the 
constancy of n;/N;. But regardless of the principle of optimum alloca- 
tion, the constancy of n;/N; would probably have been insisted upon 
to gain time and eliminate complications in the tabulation. Tables I 
and II show the sample plan in summary. 

Experience has shown that even crude approximations to optimum 
allocation are highly preferable to no guide at all. The worst to be ex- 
pected is some loss in efficiency and some loss in advance control of the 
sampling error. 

4 Optimum allocation was expounded by Neyman for simple stratified ssapling i} art-je eniitied, 


“On the two different aspects of the representative metho?,” Journsi of the Royal “‘atistwal Society, 
vol. xevii, 1934, pp. 558-625, pp. 579-80 in particular. 
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It should be noted that no bias is introduced by failure of any of the 
above assumptions. It should be noted, moreover, that the actual sam- 
pling errors as calculated from the returns are entirely independent of 
the above assumptions that were made in the planning. 


TABLE I 
SUMMARY OF THE SAMPLE DESIGN 





















































(1) (2) (3) (4) (5) (6) (7) 
Sampling ratios Number of places 
For selection | | 
Size- Assumed _| For selection | of names and . In 
class Population average of sample households In size- poe 
code in 1940 population places within a class — 
in 1940 sample place 
For koinotetes 
1 0- 499 350 1/100 1/5 2147 | ~=— 20 
2 500— 999 750 1/50 1/10 2049 | 40 
3 1900-4999 2500 1/20 1/25 1366 | 70 
4 5000 and over 7000 1/5 1/100 54 10 
| | Totals | 5616 | 140 
For demoi 
5 Under 25,000 17,000 | 1/2 1/250 2 26 
8 25,000 andover| — | 1/1 1/500 | 22 22 
| Totals | 74 } 48 


| | 
} 
Note. The total number of sample places is perhaps better counted as 199, and not as 188 (140 +48) 
showy in Table I. Every demos of 25,000 or more inhabitants was counted only once in Table I but 
the large ones contributed several sample parishes; 7 parishes from the City of Athens, for instance, 
were in the sample. Actually, data were obtained from 199 places and “semi-places” (parishes) referred 
to in the text as “about 200.” 





TABLE II 


SIZES OF STRATA AND EXPECTED NUMBER OF INHABITANTS TO BE 
SAMPLED FROM EACH, BY COMMUNITY SIZE-CLASS 











Population Expected number “Expected” number 
Size-class — (estimated from of inhabitants of households* 
code peices: is 1940 census) in sample in sample 
per stratum per stratum per stratum 
i 20 35 000 70 14 
2 40 37 500 75 15 
3 70 50 000 100 20 
4 10 35 000 70 14 
5 26 34 000 68 14 
6 33t 37 500t 75 15 





* A household was assumed to consist of 5 inhabitants. 
t See note in Table i 
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The selection of the ultimate sampling unit, the household. A sample of 
households was selected from every place that was drawn into the sam- 
ple. The procedure in every instance was to select a prescribed propor- 
tion of the households of the place. The procedure varied. In the 
smaller villages, principally in the first two community size-classes, it 
was sometimes possible to procure a list of all the families, which with 
some effort could be made complete. When this was the case, the draw- 
ing of the sample families was very simple; a starting number and a 
sampling interva! were applied to the household lists (this being, inci- 
dentally, the plan of drawing names from the electoral lists for examin- 
ing errors of the first type). In some cities and towns, however, special 
procedures were designed, depending on the size of place, whether it 
was scattered or concentrated, whether readily divisible into its ali- 
quot parts or parishes, and particularly on whether a reliable map could 
be found, or whether one had to be constructed on the spot. In sum- 
mary, several plans for drawing the samples of households were used, 
depending on the circumstances met. These plans are described briefly 
below, and they will be referred to as Schemes 1a, 1b, 2a, or 2b. 


1. A sample of households drawn directly from accurate and up-to-date 
lists of households covering an area (village, parish, or other area, even an 
entire koinotis). The method of drawing was to use a random start with 
the appropriate sampling interval. The list of households could be of one 
or another type: 

(a) A list of households by name and address. An example is furnished by 
the priest’s list of the families in his parish which could easily be con- 
verted into a list of households. However, any list had to be carefully 
checked and brought up-to-date before it could be used. In small 
communities this revision was accomplished by viewing the area in 
company with the priest or other prominent citizens who know prac- 
tically all the families and who would assist in finding any errors or 
omissions, 

(b) A “map-list,” identifying the families by location on a map which 
was usually of necessity made on the spot. Lhe households were 
numbered serially on the map and the sampling interval applied as 
before. 

2. A sample of blocks, followed by a further sampling of households from 
within the sample blocks. To carry out this scheme, a map had to be pro- 
vided or made which showed the street pattern and definitely identified 
the boundaries of the blocks or any conveniently small parcels of land 
within the area to be sampled. There were two methods of drawing the 
sample of households. 

(a) Blocks chosen for the sample with probabilities in proportion to their 
estimated sizes (size being measured by the number of households 
within the block, not by area).? The households within the sample 

block were then carefully map-listed and numbered serially. The 
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sample of households was drawn with a random start and constant 
interval within any one sampie block, but the interval was adjusted 
from block to block to give a constant number of sample households 
as calculated from the estimated sizes. Block sizes were estimated 
prior to sampling only approximately, as by cruising the area and 
making quick eye-estimates. The drawing of blocks with prob- 
abilities proportionate to sizes was easy to carry out by applying the 
proper sampling interval to the cumulative totals of sizes. (See the 
example in Appendix A.) 

(b) Blocks chosen for the sample with equal probabilities. The house- 
holds within the sample blocks were then map-listed and numbered 
serially throughout, and a sample of them drawn by applying a 
constant sampling interval over the entire list. (See the example 
in Appendix B.) 

In both Schemes 2a and 2b, strong geographic control was achieved 
by numbering the blocks in a serpentine fashion. 


Completeness of the sample. A series of controls by maps and census 
information served to ensure that every village and every household 
within a primary area had its proper chance of inclusion in the sample, 
and also—equally important—to ensure that no outside area could 
come in. Excellent maps were obtained from the British Forces in 
Greece, and more detailed ones from the Greek government. 

The completeness of the sample was remarkabie, and the coopera- 
tion of the Greek people unforgetta>!e. Actually. information was ob- 
tained concerning all but 16 of the 3052 houselhcids in the sample. 
(These 10 were allocated to the tabulations by the simple device of 
assigning to each of them the characteristics of the preceding family.) 
Nine of the 10 families could not be found at home after repeated trials. 
The 10th family was a refusal, the only one, and concerning which ii 
may be remarked that it had recently moved in trom a foreign country 
and eonteincd uo eligible voters. There was, of course, an occasional 
omission of an item of information, but the entire field-job was re- 
markably free of imperfections. 

Estimation of total population. Two methods were compared for the 
estimation of the total population of Greece in 1946. The simpler esti- 
mate would be that obtained by multiplying the sample total by 500, 
the reciprocal of the sampling fraction. As pointed out above in the 
discussion of the sample design, however, it is possible to utilize the 
information concerning the population in 1940 for a regression esti- 
mate, which should and did have a lower sampling error. For each sam- 
ple place there were two numbers, z and y: 


(xr) The “expected” number of people in 1940, which is the number of people 





























368 


(y) The number of people counted in the sample households in 1946. 
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in the area in 1940 multiplied by the within-place sampling ratio for 


A plot of the points (z, y) is given in Chart I. The 199 sampling units 
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REGRESSION OF THE OBSERVED POPULATION IN SAMPLING UNITS AGAINST THE 


“EXPECTED” POPULATION. THE BROKEN LINES INDICATE THE EXTENT 
OF VARIABILITY WITHIN X-ARRAYS. 


represented in Chart I may for the present purpose be regarded as a 
random sample of the total number of sampling units in the whole 
population (this total number is of the order of 7} millions divided by 
500, that is, 15,000). It is clear from the chart that the relationship 
between z and y is approximately linear and that the line may be passed 
through the origin. The problem is then to estimate the parameter £ in 


the equation 


(3) 


If the estimate of 8 is b, the total population in 1946 may be estimated 


as b multiplied by the total population in 1940—the sum of « over all 
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possible sampling units. The best linear unbiased estimate 6 of 6 is 
given by the equation® 
Dd wry 


bs =? 

Dd wz? 
where w is the weight of the observation y, and is the reciprocal of ¢,7. 
The variance o,? of y within z-arrays clearly increases with z, and in 
order to estimate the relationship, the y-values were divided into 


(4) 
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CHART II 
THE RELATIONSHIP BETWEEN THE VARIANCE s* WITHIN z-ARRAYS, AND z. 


groups according to the z-values, the range of z within a group be- 
ing 5 units. The within-group variance of y will be a near approxima- 
tion to the variance of y in the z-arrays of the group, because the 
within-group correlation of y and z will be small. Examination of these 
estimates of variances indicated that they were approximately pro- 
portional to the square of z, and Chart II contains a plot of s?, the esti- 

5’ W. G. Cochran, “Sampling theory when sampling units are unequal sizes,” Journal of the 


American Statistical Association, 37: 199-212, 1942. See also, W. Edwards Deming, Statistical Adjust- 
ment of Data, John Wiley and Sons, 1943, pp. 31-33. 
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mated variance, against x. The variance o,? of y is therefore equal to 
x? multiplied by the variance per unit weight. With this relationship the 
coefficient of variation of y is constant because the mean value of y fora 
given x is proportional to z. It follows that the best linear unbiased 
estimate of 6 is 


1 
> =z 
x — ¥Y " 
eee yr (5) 
> i “ n x 
“—~ 2? 


where 7 is the number of sample pairs of values of x and y. The variance 
per unit weight may then be estimated as 
1 (y 7 bx)? 


n— 1 x? 


= ——| 2 (2) - nb? | (6) 


x 





and the variance of b is equal to so?/n. It was found that 


b=0.988, 
<2 =0.0861, 
Var b=0.00043, 
o,=0.021=2.1% 


It may be noted that the above method of analysis provides two esti- 
mates of the variance so? per unit weight, Eq. 6, and the one provided 
by the relationship between s? and 2? (for which a maximum likelihood 
estimate may be obtained if the distribution of y within z-arrays is 
normal). These two estimates are of course not independent, but if the 
distribution of y within z-arrays is normal with the stated variance, a 
comparison of the two estimates gives a rough indication whether a 
straight line through the origin adequately represents the data. In the 
present case, the data are not sufficiently numerous to make this com- 
parison. To indicate the reliability of the assumed variance relation- 
ship, there are also given on Chart I lines to show the assumed varia- 
bility of y in the z-arrays; because the variance is assumed proportional 
to x’, the standard deviation is proportional to z, so that lines of slope 
b +8» through the origin should contain about two-thirds of the y-values 
in each z-array, and this is seen to be about right. There are a few 
anomalous points, the existence of which may in part at least be as- 
cribed to the use of the less accurate sampling schemes in a few places, 
or large changes in population between 1940 and 1946. 
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The population of Greece in 1940 was 7,344,860, so that the esti- 
mate of the population in 1946 is 7,257,000 with a standard error of 2.1 
per cent, or 152,000. This estimate may be compared with that obtained 
by multiplying the sample total by 500, namely 6,975,000, with a stand- 
ard error of 3.7 per cent. The precision of the regression estimate is 
therefore three times that of the simple estimate. 

Sampling errors of other results. The main use of the figures is for 
administrative purposes for which approximate measures of precision 
will suffice. Evaluation of the exact sampling errors for most of the re- 
sults would be prohibitively complicated. In the first place, various 
schemes of sampling households within sample places were used. Sec- 
ond, the results were obtained by multiplying the proportions esti- 
mated from the sample by the estimated total population, and both the 
proportions and the estimated total population are subject to sampling 
error. Approximate measures of accuracy, however, are essential and 
obtainable. 

Approximate standard errors for several characteristics are shown 
in Table III. For other characteristics it may be surmised that there is 
a fairly close relationship between the standard error of an estimated 
figure and the magnitude of that figure. Such relationships are usual 
with survey data. Thus, from the magnitude of the errors listed in Ta- 
ble III, standard errors of many other characteristics may be qualita- 
tively inferred. 


THE RESULTS: POPULATION CHARACTERISTICS OF GREECE 


Six tables are presented herewith (Tables IV-IX). The figures in 
them were derived from hand tabulations carried out in Athens, sub- 
sequent to the completion of the field-work on August 7, 1946, and 
prior to the press announcement of August 19 in regard to the fairness 
of the electoral lists. All tables (except Table IX) include the armed 
forces of Greece. 

Table IV shows the population by sex and age classes. The standard 
error of sampling of the estimate of the total population of Greece 
(7.26 million) is 2.1 per cent, which fixes the total population with 95 
per cent probability within about 4 per cent. The decrease in popula- 
tion between 1940 and 1946 is indicated by the sample to be 1.2 per 
cent. This indicated decrease, however, is subject to the aforemen- 
tioned sampling error of 2.1 per cent, wherefore it can be concluded 
that the population change since 1940 must have been slight, lying 
with 95 per cent probability between a decrease of 5 per cent and an 
increase of 3 per cent. 
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Chart III shows the age-sex pyramids for 1946, 1928, 1920, and 1907 
(unfortunately the census of 1940 was never tabulated by age). 
Greece’s population of age 0-9 shows a serious deficit. This deficit 
can be laid to two factors: (7) the rapid decline of the birth rate, which 


TABLE III 
ESTIMATED STANDARD ERRORS FOR VARIOUS CHARACTERISTICS 








Standard error as 


Estimate in 
percentage of 








Characteristic 
thousands 
estimate 
Population 
15-19 male and female 825 3.4 
20-29 male and female 1197 3.2 
50-59 male and female 534 3.9 
20-29 male 560 3.9 
50-59 male 258 4.7 
Literacy 
14-19 female, literate 432 4.6 
14-19 male, illiterate 47 13.6 
20-39 male and female, literate 1757 3.4 
20-39 male, literate 966 3.0 
20-39 female, literate 791 4.9 
20-39 male, illiterate 90 10.5 
20-39 female, illiterate 358 6.4 
Labor Force 
20-39 male, in labor force, at work 926 4.4 
20-39 male, in labor force, not at work 78 13.9 
20-39 male and female, in labor force, at work 1150 4.3 
20-39 male and female, in labor force, not at work 101 14.4 
40-59 male, in labor force, at work 618 4.6 
40-59 male, in labor force, not at work 40 17.7 
20-39 male, employer of others 39 20.6 
40-59 male, employer of others 38 26.5 
14-19 male, employed by others 175 8.0 
14-19 female, employed by others 69 14.0 
20-39 male, employed by others 452 6.6 
40-59 male, employed by others 214 9.4 
Sex Ratio 
0-4 1.18 2.0 





fell from an average of 29.5 per thousand in 1931-35 to 23.5 per thou- 
sand in 1939; and (zz) the high mortality rate of children during the 
war.® The importance of public health measures to conserve the popu- 
lation of children now aged 0-9 is apparent. 

Calculations made by Dr. T. Nicholas Panay who assisted the au- 
thors in Athens indicate that if the logarithmic rate of growth deter- 
mined by the censuses of 1928 and 1940 had been maintained through 

¢ The authors are indebted to Dr. Dudley Kirk of the Office of Population Research for a number of 


helpful suggestions, such as pointing out the decline in the birthrate of Greece, and the sex composition 
of the immigrants and emigrants (mentioned in a later paragraph). 
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1946, the population of Greece in July 1946 would have been 8 million, 
or 2 million higher than the population determined by the sample. 


4 


This estimaied 3 million may be largely ascribed to losses due to the 


4 


war and occupation both in actual deaths, military and civil, from all 





causes, 2nd in deficit arising from the decreased number of births. 


1946 
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AREAS SHOW PERCENTAGES. ONE UNIT ON THE HORIZONTAL BY A 5-YR 
INTERVAL ON THE VERTICAL EQUALS ONE PER CENT 


CHART Il 


THE POPULATION OF GREECE BY AGE GROUPS AT DIFFERENT DATES 


Table V shows the sex-ratio in the various age classes, and compari- 
sons with 1928, 1920, and 1907. It should be remembered that between 
the censuses of 1920 and 1928 huge interchanges of population took 
place between Greece, Turkey, and Bulgaria. The net result of these 
transfers was to increase the population of Greece by nearly a third. 
There are at least three factors operating to bring about a low sex-ratio 
in the young adult classes: (7) war. (i) the refugees from Asia Minor 
had a lower sex-ratio than native Greeks. (7i7) emigration, which has 
included a disproportionately large number of young men. As in cen- 
suses in most countries, an under-count of young children must be pre- 
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sumed in the observations reported here. Moreover, it appears by 
comparing the figures for male and female children, or by observing 
from Table V that the sex-ratio in the age group 0-4 is 1.18, that the 
under-count must have been more pronounced for female children than 


TABLE IV 
POPULATION OF GREECE BY SEX AND AGE 


(July 16 to Aug. 7, 1946) 
(All figures in thousands) 














(1) (2) (3) (4) 
Age group Total Male Female 
All ages 7257 3599 3658 
O- 4 634 343 291 
5- 9 721 372 349 
10-14 842 434 408 
15-19 825 392 433 
20-29 1197 560 637 
30-39 1008 496 512 
40-49 813 425 388 
50-59 534 258 276 
60 & over 683 319 364 





TABLE V 


THE SEX RATIO (MALE:FEMALE) BY AGE GROUPS IN GREECE 


AT VARIOUS CENSUSES 




















(1) (2) (3) 4) (5) 
ili 1946 1928 1920 1907 
a AMFOGE Census Census Census 
All ages 0.98 0.98 0.99 1.01 

0- 4 1.18 1.04 1.09 1.06 

5- 9 1.07 1.05 1.08 1.08 

19-14 1.06 1.09 1.10 1.07 
15-19 91 .98 .95 80 
20-29 .88 93 .87 .93 
30-39 97 .90 . 86 1.03 
40-49 1.10 .96 1.04 1.09 
50-59 .93 1.04 1.08 1.11 
60 & over .88 .94 .94 1.07 





for male. The phenomenon may be exaggerated by the sampling errors, 
but the pattern by sex and age classes for 1928, 1920, and 1907 (Table 
V) indicates that relative under-reporting of female children has ex- 
isted in previous censuses as well. A possible alternative, not to be 
thrown out summarily, is that perhaps because of special conditions 


of nutrition in Greece, there has been unusually high mortality of fe- 
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male children in the past several years, and perhaps for many years: 
Unfortunately, the birth and death records are incomplete and the de- 
gree of under-reporting of young children cannot be ascertained. 

This was a household sample, but all members of the household not 
having permanent residence elsewhere were to be counted at home, 
whether actually living at home or not at the time of enumeration. 
This approach gave a count of the armed forces and obviated the need 


TABLE VI 
LITERACY OF THE GREEK POPULATION 
(July 16 to Aug. 7, 1946) 
(Absolute figures in thousands) 









































(1) | (2) (3) | (4) (5) | (6) (7) 
Age group Total population Literate Illiterate 
and soz iin Per cent | tate Per cent ; eaten ere 
8 and over 6 219 100 4470 71.9 1 749 28.1 
Male 3 042 100 2 57 84.5 72 15.5 
Female 3177 100 1 900 59.8 77 40.2 
8-13 984 100 774 78.6 =| 210 21.4 
Male 505 100 410 81.2 | 95 18.8 
Female 479 100 364 76.0 115 24.0 
14-19 1 000 100 864 86.4 | 136 13.6 
Male 479 100 432 90.2 47 9.8 
Female 521 100 432 82.9 89 17.1 
20-39 2 205 100 1 757 79.7 448 20.3 
Male 1 056 100 966 91.5 90 8.5 
Female 1149 100 791 68.8 358 31.2 
40 & over 2 030 100 1 07! 955 7.0 
Male 1 002 100 762 .0 240 24.0 
Female 1 028 100 2°23 10.4 715 69.6 








for special counts of armed bands and prisons, which would have been 
impossible. Monasteries came into the sample in the regular way. The 
number of homeless children in asylums has since been determined and 
it turns out to be under 14,000 of which roughly two-thirds are male. 
This information was obtained by Dr. Panay, mentioned earlier, who 
kindly consulted the proper government officials in Athens and even 
ascertained some figures directly from heads of institutions in various 
parts of Greece where it seemed desirable to do so. 

Tables VI-IX show the results obtained for various other population 
characteristics. In all instances the answers given in the household were 
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accepted.’ In regard to literacy for example, if the information ob- 
tained was that a particular person could read and write in any lan- 
guage, there was no resort to any test. 

Likewise a man claiming to be employed was not asked for any 
proof; in particular, he was not asked how many hours he worked last 


TABLE VII 


EMPLOYMENT STATUS OF THE GREEK POPULATION 
14 YEARS OF AGE AND OVER 
(July 15 to Aug. 7, 1946) 
(Absolute figures in thousands) 








(1) | (2) (3) (4) | 5) 6) «@ | @ (9) 




















| 
In the labor force | Not in the labor force | Per cent of labor force 
A | Tote N Not : 
a nei otal At work Not at Normally At work : - at 
and sex | population ; work but . House- ; work bui 
or with : in : Other or with ; 
; seeking wives 2 seeking 
a job school a job 
work work 
14and over| 5 235 2 466 197 282 1 391 899 92.6 7.4 
Male 2 537 2 032 147 175 _ 183 93.3 6.7 
Female 2 698 434 50 107 1 391 716 89.7 10.3 
14-19 1 000 360 40 241 34 325 90.0 10.0 
Male 479 262 20 143 — 54 92.9 7.1 
Female §21 98 20 98 34 271 83.0 17.0 
20-39 2 205 1 150 101 4 639 274 91.9 8.1 
Male 1 056 926 78 32 _ 20 92.2 7.8 
Female 1149 224 23 639 254 90.7 9.3 
40-59 1 347 705 45 -- 531 66 94.0 6.0 
Male 683 618 40 _ _ 25 93.9 6.1 
Female 664 87 5 -— 531 41 94.6 5.4 
60 and over 683 251 11 _ 187 234 95.8 4.2 
Male 319 226 9 _— _ 84 96.2 3.8 
Female 364 25 2 _ 187 150 92.6 7.4 

















week. There was thus no measure of under-employment. The employ- 
ment figures by themselves do not give an adequate picture of the 
degree of employment. Under the conditions existing, there could very 
well be some upward distortion in the number of people claiming to be 
employed and self-employed. Many people especially men and boys 
undoubtedly reported themselves as employed when in fact they were 


7 This statement applies only to the population characteristics, not to information obtained con- 
cerning registration. As already stated, information obtained in the household regarding registration 
was compared with information from the registers. 
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only working part-time as peddlers. On the other hand, some of them 
were doing this sort of work from choice rather than to accept some 
regular form of employment at the prevailing low wages. Unfortunately 


TABLE VIII 


NATURE OF EMPLOYMENT OF THE GREEK POPULATION 
OF AGE 14 AND OVER 


(July 15 to Aug. 7, 1946) 
(All figures in thousands) 


























(1) (2) (3) (4) | (5) (6) (7) | (8) (9) (10) 
Age Employer of Others | Employed by Others Self-employed 
Total, 
14 and Total Male Female| Total Male Female | Total Male Female 
over - 
114 106 8 1 127 877 250 | 1225 1049 176 
14-19 3 2 1 244 175 69 | 113 85 28 
20-39 43 39 4 590 452 138 517 435 82 
40-59 39 38 1 251 214 37 415 366 49 
60 and over 29 27 2 42 36 6 180 163 17 
TABLE IX 


EMPLOYMENT OF THE GREEK POPULATION IN CERTAIN INDUSTRIES 
(July 15 to Aug. 7, 1946) 
(All figures in thousands) 























(1) | (2) | (3) (4) (5) (6) (7) | (8) (9) (10) (11) (12) 
—_ MALE FEMALE 
Industry rome 14 14-19 20-39 40-59 60 | 14 14-19 20-39 40-59 60 
14 and 
saaiaia and and and and 
over over over over 
| 
Agriculture | 1436 | 1183 187 507 332 157 |253 64 117 55 17 
Woodswork 40 38 2 18 17 1 2 1 1 -- — 
Fishing 25 25 5 10 7 3 — — —_— _ _— 
Mining, including 
Salt Mining 13 13 1 5 6 1 -— -—— — = -- 
Manufacturing 141 94 14 48 28 3 47 14 27 4 2 
Construction 60 59 4 2 23 9 — 1 —— — 
Transportation 79 79 3 37 35 4 _ —_— _ — _— 
Business 341 307 23 137 108 39 34 7 20 1 














it was impossible to ask for the number of hours worked per week; this 
information, however valuable, would have crowded the schedule and 
complicated the work beyond the allowable tolerances. 

In regard to employment status (Table VII) it should be remarked 
that the classification was decided upon after conferences with officials 
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in the Greek government. The figures given in Table IX are not ex- 
haustive—that is, the Army, Navy, and certain miscellaneous and 
unclassified industries have been omitted. The figures for “Normally in 
school” in Table VII are to be interpreted as the number of males and 
females 14 and over intending to go back to school in the fall. Some of 
them were, of course, at work at the time the observations were made, 
but they were, nevertheless, classed as normally in school. No one was 
given a double assignment. 

No family characteristics were tabulated, but a count of heads by 
sex showed 83 per cent of the heads to be male. The number of people 
in the sample of all ages was 13,813 and the number of households was 
3052; accordingly the average size of household is 4.2 people, with an 
insignificant sampling error (not actually calculated). This figure, of 
course, includes in-laws and friends and lodgers living under the same 
roof, sharing expenses, food and household facilities. 
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ON A POPULATION SAMPLE FOR GREECE 


APPENDIX A: BLOCKS DRAWN WITH PROBABILITIES 
IN PROPORTION TO SIZE (SCHEME 2a) 


‘he Sample for Parish Ayios Panteleemous in the Demos of Ayios 
Georgios Keratsiniou 

This appendix shows the actual drawing of the household sample in 
a sample parish in the Athens-Piraeus metropolitan district. Blocks 
were drawn from the sample parish with probability proportionate to 
their estimated sizes (Scheme 2a), and households were then drawn 
from the sample blocks. The procedure of selection is illustrated in the 
next sections. 

The demos Ayios Georgios Keratsiniou is part of the metropolitan 
area of Athens-Piraeus. Because of its size (Class 6) it had a probability 
of unity (certainty) of coming into the sample. There were five parishes 
in this demos, and one was drawn at random (Parish Ayios Pantelee- 
mous). The sampling rate within the sample parish was assigned as 1 
household in 100, satisfying the requirement that the over-all probabil- 
ity of a household being selected from the demos should be 1 in 500, as 
expressed by the equation 

1 1 1 1 


ones x — pa a 

l 5 100 500 
The first fraction is the probability (unity) of the demos being in the 
sample. The second fraction is the probability of the parish being in 
the sample; and the third fraction is the probability for the selection of 
households from within the sample parish. Had sizes of parishes (as of 
1940) been known at the time the sample was drawn it might have been 
tempting to draw the parishes from within demoi with probability 
proportionate to their 1940 sizes, but the figures were not on hand till 
later, and moreover, there were often complications arising from changes 
in boundaries of parishes since 1940, rendering this suggestion a diffi- 
cult one to carry out. 

The first step was to obtain or make a map of the sample parish and 
to number the blocks in serpentine fashion. 'The second step was to 
cruise the parish with a jeep to estimate the number of households in 
each block, making stops only where necessary to get fair approxima- 
tions to the actual number of households, resulting in a table of esti- 
mated households (Table X). 

It will be noted that Blocks 54, 55, and 56 were tied together and 
came into the sample as a unit. With this type of sampling scheme small 
blocks should be tied with other smal! ones or with a large one, for two 
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TABLE X 
ESTIMATED HOUSEHOLDS BY BLOCKS: PARISH AYIOS PANTELEEMOUS 


(This table was brought in by the observer) 

















(1) (2) (3) (1) (2) (3) 
Block Estimated Cumulative Block Estimated Cumulative 

number households total number households total 
1 14 14 50 2 555 
2 20 34 51\ 20 575 
3 10 44 52 3 578 
4 13 57 53 25 603 
5\ 12 69 "| 3 608 
6 5 74 55} (s) 12 618 
7 20 94 56 3 621 
8 15 109 si 0 621 
9 23 132 58 0 621 
10 16 148 59| 0 621 
11 29 177 60) 22 643 
12 15 192 61\ 11 654 
3 18 210 2 8 662 
14 15 225 6% 2 17 679 
15 22 247 64 19 698 
16 17 264 65 14 712 
17 20 284 66 12 724 
18 (s) 23 307 67 8 732 
19 12 319 68 8 740 
20 1 320 69 9 749 
21 0 320 70 8 757 
22 2 322 71 ~ 765 
23| 0 322 72 6 771 
24 14 336 73 9 780 
25 19 355 74 7 787 
26\ 0 355 75 8 795 
27) 11 366 76 7 802 
28 19 385 77) 6 808 
29 1 386 73| 0 808 
30 8 394 79| 0 808 
31 7 401 80} 0 808 
32 5 406 81| 0 808 
33) 2 408 82 0 808 
34 4 412 83 7 815 
35 4 416 84 37 852 
36 1 417 85 6 858 
37 0 417 86 2 860 
3 7 424 87 2 862 
39 2 426 88 0 862 
40 0 426 89 17 379 
41 0 426 90 12 891 
42 38 464 91 7 808 
43 0 464 — 9 907 
44 0 464 93 11 918 
45 16 480 94 10 928 
46 16 496 — 3 931 
7 20 516 96 (s) 13 044 
48 19 535 97 12 956 
49 0 535 98 10 966 














‘ION 











ON A POPULATION SAMPLE FOR GREECE 381 


reasons: first, because it is essential that a sample block (or group of 
tied blocks) contain enough households to provide the required number 
for the sample (Col. 3 of Table XI); second, because of the usual high 
correlation between households within a small block. A decrease in the 
standard error of sampling can be brought about by enlarging the 
block, although it should be noted that failure to heed this advice does 
not result in bias. The tying should be done before the sample blocks 
are selected, otherwise bias can be expected. 

The sample blocks are designated by the letter s in this table. They 
were selected by applying the constant interval 322 to the cumulative 
totals, starting with the random number 288 (selected between 1 and 
322) according to the tabular scheme shown below. 


Designation of the sample blocks and households 


Sampling rate within the sample parish (1/500) (5/1) 1/100 
Cumulative total of estimated households 966 
Estimated number of households in the sample, 966/100 10 
Take 3 blocks (a convenient work load) 

Sampling interval for blocks, 966/3 322 
Random start 288 


The sampling plan for the parish is summarized in Table XI, which 
constitutes a set of instructions actually handed to the observer, who 
was instructed to map-list the sample blocks, including and identify- 
ing every household with a number, starting afresh with 1 in each 
block. The estimated 10 sample households were assigned to the three 
blocks, with approximate equality (4, 3, 3). The numbers in the last 
column designate the sample households, which were obtained by us- 
ing a random start in each block and applying the designated sampling 
interval. The sampling interval to be applied in any sample block was 
simply the total estimated households in that block divided by the 
estimated number of households in the sample. If the block turns out 
to contain twice as many households as were estimated in advance, the 
actual number of sample households will also be twice the estimated 
number, because the sampling interval is not to be altered. 

The actual number of sample households may be more or less than 
the number designated in the column “Estimated households in sam- 
ple” of Table XI, depending on the accuracy of the preliminary esti- 
mates. The specified sampling interval was to be applied to the actual 
number of households found in the block, even if the block were found 
to contain twice as many households as estimated (last column of 
Table XI). The sampling interval is fixed by the preliminary estimates, 
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which, because they also determine the probabilitics <f selection, leave 
the sample unbiased. 
TABLE XI 
DESIGNATION OF WITHIN-PLACE SAMPLE 











(1) (2) (3) (4) (5) 
Block Estimated households Sampling Heusebolds in 
numbe: Total In sarmple interval the sample 

1S 23 4 6 ae: % . ee 

4, MW, Se 18 3 6 2,8, 14,20,... 

% 3 3 


4 iG, 3, FG, <--> 





&PPENDIX B: BLOCKS DRAWN WITH EQUAL 
PROBABILITIES (SCHEME 2b) 


th teks scheroe no table of estimated households for each block is 


reauyc %a. tee sbove sample parish, a work-lead of 10 blocks might 
hw oo wewws! adequate for this plan. There being 98 blocks in thts pav- 
tigh: macs line asterval for blocks would be 98/10 = 10. With a ran- 
igyt of 4, Gieews 4, 14, 24, 34..., would be in the sampie. They 

ich: Geer Be, Bae need and the households numbered contimuousiy 


wes, Khe fyst eee to the last household in each sample block. The 
eeasteng otdirrvat for beeseholds would be 100/10=10. With 7 as a 
pio s#ot. $22rnois rambered 7, 17, 27, 37... would he the 
Me te =onda Tor Yleock 4, for example. The sample household for 

re) Rweks ould be drawn in the same way with diifer- 


ote wm. eta to Se:emes 2a and 2b 


ferAusiiary wort of crdisins ihe area preparatory to drawing 
wit pace dility ke precrrtic: te size (Scheme 2a), the amount 


iatee, uma hence the oor of craising, can theoretically range all 
€ syhat Nets. ware Beth . high «acount when an over-meticuious 
Is I) “every )‘ock. In the former case ihe blocks 


~arshy assigrved: equal sizes; and drawing them with preiadility in 
wrtiog, te size is then identically the same operation as drawing 
ci &t random. As a matter of fact, drawing blocks with equai preba- 
\itzes is usefully regarded as a modification of drawing them with 
<vbabilities in proportion to estimated sizes. 
A limited amount of preliminary cruising will often net considerable 
saving »y eliminating the need for a large sample of blocks. The fewer 
the blocks in the sample, the easier the job, because every sampie block 
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must be carefully m p-listed, even if only one sample household is to 
pe drawn out of it. Hence, in place of cruising to make a fair estimate 
of the size of every Lisck, 2nd to draw blocks with probability in pro- 
portion to size, one may well prefer tu sake 2 quicker cruise for the 
purpose of trying small but not necessarily contiguous blocks together 
on the map with \.ue pencil to build artificial blocks of approximately 
equal size and then to draw these artificial blocks with equal probabili- 
ties. It is also well to mark the zero-areas on the map (those that ob- 
viously contain no households), and to exclude them from the sample 
by skipping them in the numbering cf the blocks, which takes place at 
this stage. Whether one designates this plan as 2a or 2b is unimportant. 
In case of doubt, it is probably advisable to cruise the area for the 
purpose of making quick estimates for every block, and to apply 
Scheme 2a. This plan i -s the advantage of being definite and requiring 
less supervision than the plan of partially equalizing the sizes of blocks. 
It should be pointed out that the plan io be recommended may well 
depend on whether there is to be only one survey or a continuing series 
of monthly or quarterly surveys. In the latter event, the greater first- 
cost of Scheme 2a should be regarded as an investment that may pay 
big dividends if it is used often enough, because of the smaller samples 
required. Under either Scheme 2a or 2b the household listings can be 
sampled over and over until exhausted. Here appears another reason 
for tying small blocks together or to large ones in Scheme 2a: the re- 
sulting larger sampling interval for households within the block (last 
coluran of Table XI) permits more samples to be drawn for future sur- 
veys, without drawing and listing a new sample of blocks. , 
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ESTIMATING THE RESIDENT ALIEN POPULAT(C 
OF THE UNITED STATES* 


E. P. Hurcurmson 
University of Pennsylvania 
AND 
ERNEST RUBIN 
Immigration and Naturalization Service, Department of Justice 


= 
Jin 


The alien pepulation of the United States on June 30, 1945 
is estimated on the basis of the 1940 alien registration; reportec 
arrivals, departures and naturalizaiions of aliens after 1%**:; 
and an estimate of alien mortality. The estimate refers to the de 
jure ratber than the de facto alien population, for it does sot 
include temporary visitors and aliens illegally in the United 
States. The problem of estimating the number of aliens is 
shown to differ in several respects from that of estimating the 
total population. Although immigration, emigration, natura! 
izations and deaths are the major sources of change in the 
size of the alien population, an examination of immigration 
and nationality law and of official statistics reveals no less 
than 25 classes of alien arrivals, departures and changes of 
citizenship status that are not included in the officially re- 
ported totals of immigration, emigration and naturalization. 
Allowance for certain of these special classes of alien move- 
ments should be made in estimating the resident alien popula- 
tion of the United States. 


ETWEEN August 27 and December 26, 1940, over five million 
B aliens registered in compliance with the Alien Registration Act 
of that year. Of these aliens approximately 4,890,000 were in conti- 
nental United States. Since that time considerable changes have 
iaken place in the size of the alien population, which has received some 
additions through immigration and has been considerably decreased by 
deaths, departures, and naturalizations. Inasmuch as not all changes 
in the number of aliens can be fully accounted for, the number in the 
United States at any time is not known precisely and can only be 
estimated. By methods described below, it is estimated that the resi- 
dent alien population of continental United States was reduced to 
between 3,400,000 and 3,500,000 by July 1, 1944, a decrease of over 
1,400,000 in the 3} years following the original registration. A further 
decrease of about 300,000 is estimated to have taken place by mid- 
year of 1945. 


* The opinions expressed in this article are those of the authors, and do not necessarily represent 
the views of the Immigration and Naturalization Service. 
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For purposes of estimation, an alien population differs in several re- 
spects from a general population. The latter, defined on an areal basis, 
ean be entered only by birth or immigration and left only by death cr 
emigration. An alien population, sowever, is a status group, and can 
be entered or left threwg® % change in c)tizenship status as well as by 
physical movement into or out of tke group. In the present case 
naturalization is a major factor; in reee™:5 years it has exceeded death 
and migretiin combined as a sources c* change in the number of aliens 
in tie Ucuied States. A further d?!*erence, at least under United States 
law, }: shat the alien pomu?= so w not increased by births, the reascen 
being that the Anierican-berr © »iidren of alien parents acquire citizen- 
ship at birth. With mins: exceptions, the alien population of the 
United States is reeruited only by immigration. Decrease takes place 
peimarily theough *ne death, departure, and naturalization of aliens. 
An esiimate of the number of aliens in the United States at any time 
afier 1940 can therefore be obtained by adding to the original registra- 
tion total the amount of subsequent immigration, subtracting natural- 
izations and alien emigration, and making-allowance for the deaths of 
aliens during the interval after registration. The application of this 
general procedure is described below with special reference to pro- 
cedural problems peculiar to an estimate of an alien population, and 
with a listing of the special categories to be included for a reasonably 
full accounting for alien population movements in the United States. 


ORIGINAL REGISTRATION DATA 


The final total of alien registrations in continental United States 
as of December 31, 1940 was 4,889,770,? which total includes registra- 
tions during the period of August 27 to December 26 inclusive, alien 
arvivals during the remainder of December, and the delayed registra- 
tions of aliens who were in the United States during the registration 
period. The question arises as to the reliability of this figure, especially 
in view of the considerably smaller number of aliens enumerated in the 
Federal census of the preceding April. The penalties provided for non- 
compliance with the Alien Registration Act were naturally such as to 
secure a fuller reporting of alienage, while at the same time there were 
undoubtedly elements of overcounting in the registration total. It is 
known that some registrations were made through error or because of 
uncertainty about citizenship. Persons whose obligations under the 

1 With the exception of American-born children of foreign diplomats. G. H. Hackworth, Digest of 
International Law, Vol. 3, p. 12 (Department of State, Publication 1708, Government Printing Dffice, 
Washington, 1942). 


# All figures are from the United States Department of Justice, Immigration and Naturalization 
Service, unless otherwise noted. 
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Act were not clearly defined were advised to register, and were included 
in the reported totals.’ Inflation of the registration totals also occurred 
because of the four-month registration period, during which time 
registered aliens may have died, become naturalized, or left the United 
States without their names being removed from the register. In so far 
as deaths during the registration period are concerned, an approximate 
correction can be made as described below. With regard to naturaliza- 
tions, application for citizenship in the form of either first or second 
papers did not exempt an alien from registration, but aliens whose 
petitions for naturalization were pending and who expected to be 
naturalized before December 26, 1940, were instructed not to register. 
The overloading of the total with temporary visitors was reduced by 
the limitation of the registration requirement to aliens staying inthe 
United States for thirty days or more, but registered aliens depariing 
from the United States during the registration period would have re- 
mained in the registration total.4 Alien seamen were registered on a 
special form, and registrations on this form were not included in the 
reported totals. 

On the other hand, there were sources of undercounting in the regis- 
tration figures. Officials of foreign governments together with members 
of their families were exempt from registration. Of more importance 
numerically was the undoubted failure of some aliens to register. Some 
residents of the United States may have believed themselves to be 
citizens when in fact they were aliens. Other aliens may have failed to 
register through negligence or through ignorance of the requirement. 
Aliens who passed as citizens may have preferred not to register and 
thereby reveal their alienage. There is also the not inconsiderable 
nuraber of aliens illegally in the United States who had reason to avoid 
registration. A large proportion of such persons illegally in the country 
who have been apprehended since 1940 are found to be unregistered. 

With allowance for the sources of overcounting and undercounting, it 
is believed that the 1940 registration figures more probably understated 
than overstated the number of aliens then in the United States, but 
there is not sufficient basis for a correction of the totals to be attempted. 
The original 1940 registration of 4,889,770 has, therefore, been used 
without correction as the base figure for estimating the alien population 
in later years. 

* For example, women who lost citizership before September 22, 1922 through marriage to aliens, 
but whose citizenship was restored by the Act of June 25, 1936 (49 Stat. 1917) as amended by the Act 
of July 2, 1940 (44 Stat. 715). Some native-born and therefore citizen children of alien parents were 
erroneously registered. 

‘In this connection it should be noted that by the terms of the Act, aliens leaving the United 


States before the expiration of the registration period (December 26, 1940) were under no obligation to 
register. 
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DE FACTO OR DE JURE BASIS 


Among the aliens in the United States at any time are not only 
permanent or semi-permanent residents but also temporary visitors 
whose duration of stay is not more than a few days or a few hours. 
This presents the question of whether the estimate of alien population 
is to be on a de facto basis to include all aliens physically present in the 
country at the specified time, or whether it is to be on a de jure basis 
to include only those aliens of permanent or semi-permanent residence. 
The former basis would give a larger figure than the latter, although in 
some cases, such as that of resident aliens serving with the armed forces 
overseas or aliens temporarily out of the country, de jure residence 
continues in spite of de facto absence. In so far as the original registra- 
tion totals of 1940 are concerned, there is no choice of which basis to 
use; the registration requirement applied only to aliens remaining in 
the United States for thirty days or longer,’ and the alien registration 
statistics give no classification of residence according to intended dura- 
tion of stay. Alien arrivals and departures, however, are separately 
reported according to the temporary or permanent character of the 
movement. 

Official statistics of aliens entering the United States record three 
classes of arrivals with respect to intended duration of stay. These are 
designated “immigrants,” “non-immigrants,” and “non-statistical 
aliens.” The immigrants are aliens admitted for permanent residence 
The non-immigrants are aliens admitted for temporary stay, usually in 
excess of thirty days, under Sections 3 and 4(e) of the Immigration 
Act of 1924. Classed as non-immigrants are officials of foreign govern- 
ments together with entourage, temporary visitors for business or 
pleasure, aliens in transit through the United States, treaty merchants, 
seamen and students. Aliens returning to an established residence in 
the United States after a temporary visit abroad are also recorded as 
non-immigrants. The third class, non-statistical alien, is made up of 
local border crossers in possession of border-crossing cards, together 
with other aliens entering for a declared stay of less than thirty days. 
In the aggregate such temporary admissions far exceed the combined 
totals of the other two classes of alien arrivals. 

Departing aliens are similarly classified, but full correspondence 
between the class of arrival and the class of departure is not achieved. 
The emigrant alien for purposes of statistical reporting is defined as one 
leaving after a residence of one year or more in the United States, and 
with the declared intention of residing permanently abroad. The class 


5 Sec. 31, Alien Registration Act of 1940 (54 Stat. 673-74). 
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of denarture, therefore, relates to the dur: tion of stay rather than to 
the class of arr:vai. The so-called non-emigrants are those aliens leaving 
after a stay of less than one year in the United States, tegether with 
resident aliens departing with the declared intention of retarning wit! 
in one year. The departures of aliens belonging to the non-statistical 
or border crosser category are not fully reported, and their numbers 
are not known. 

In so far as possible, the de jure or resident basis was adhered to in 
estimating the alien population of continental United States after 
1940.6 Only aliens of the immigrant category, those admitted for per- 
manent residence in the United States, were added to the estimated 
resident alien population; and only the emigrant aliens were subtracted. 
The final estimate is not entirely on a resident basis, however, inas- 
much as the 1940 registrations from which the estimate proceeds in- 
cluded non-resident aliens who remained in the United States for 
thirty days or more. 

RECORDED ALIEN MOVEMENTS, 1940-1945 


The changes in the alien population arising from immigration, emi- 
gration, and naturalization are directly recorded. As shown in Table 1, 
the number of aliens in the United States and possessions decreased 


TABLE 1 


RECORDED CHANGES IN THE ALIEN POPULATION OF THE UNITED STATES, 
INCLUDING POSSESSIONS,* JANUARY 1, 1941-JULY 1, 1945 














Jan.—June Fiscal F:scal Fiscal Fiscal 
1941 Year Year Year Year Total 
1942 1943 1944 1945 
Increase 
Immigrants 25 ,833 28,781 23 ,725 28,551 38,119 145, °99 
Decrease 
Emigrants 8,443 7,363 5,107 5,669 7,442 34,014 
Naturalizations 157 ,844 270 ,364 318,933 441,979 231 ,402 1,420 ,522 
Net Decrease 140,444 248 ,946 300,315 419 ,097 200 ,757 1,309 ,527 





* Alaska, Puerto Rico, Hawaii and Virgin Islands. 


1,309,527 between January 1, 1941 and July 1, 1945 by reason of an 
excess of naturalizations and emigration over immigration. Of this de- 
crease, 7,478 was in the outlying possessions, 1,302,049 in continental 
United States.” 


® The de jure basis was preferred because of the incomplete record of non-statistical aliens. A further 
reason for this choice was that information on age, necessary for the estimate of mortality, was available 


only for immigrants and emigrants. 
7In Alaska, Puerto Rico, Hawaii and the Virgin Islands there were 5,758 naturalizations, 798 


immigrants and 2,518 emigrants between January 1, 1941 and July 1, 1945. 
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ESTIMATE OF ALIEN MORTALITY 


Following the 1940 registration a continuous or current register of 
aliens was established, and was maintained through the addition to the 
register of new arrivals and the removal of alien departures, naturaliza- 
tions and reported deaths. Report of immigration, emigration and 
naturalization was received through the operations of the Immi- 
gration and Naturalization Service, and instructions called for the 
surrender of alien registration cards in the event of death. It be- 
came apparent, however, that there was an incomplete reporting of 
deaths to the register. During the first twenty-six months after 1940, 
for example, only 39,412 deaths were reported, whereas it could be 
estimated from the age distribution of the alien population that mor- 
tality losses must have been about 7,000 per month. Since citizenship 
is not reported on the death certificate, there is no direct source of in- 
formation on the number of deaths of aliens. Alien mortality must, 
therefore, be estimated if changes in the alien population of the United 
States are to be computed. 

It is generally accepted that the force of mortality is greater for 
the alien than for the citizen because of the less favorable occupational 
and economic status of the former. It is established that the foreign 
born, among whom the aliens are included, have higher death rates 
than the native born;® and the aliens may have exceeded the foreign 
born as a group in mortality. The extent of the mortality differential 
between aliens and foreign born or between aliens and the general 
population, however, is not determined. For purposes of approximation 
the number of alien deaths after the 1940 registration was estimated 
according to the United States Life Tables, 1930-1939 (preliminary) 
for white males and females.® Although this life table was based on the 
mortality of the preceding decade, which was somewhat greater than 
that prevailing in the early 1940’s, it was assumed to underestimate 
somewhat the rate of alien mortality.'° It was nevertheless believed to 
give a sufficiently close approximation for purposes of the estimate. 

The application of survival ratios to each sex and age group of the 
alien population of 1940 gave an indicated mortality loss of over 50,000 

8 See F. E. Linder and R. D. Groves. Vital Statistics Rates in the United States, 1900-1940. p. 186; 
also Statistical Bulletin (Metropolitan Life Insurance Company), September 1944, pp. 5-7. 

® United States Department of Coramerce, Bureau of the Census. July 21, 1941. 

10 Apart from the probable differential of mortality between aliens and citizens in general, the alien 
population of 1940 contained about 2} per cent of non-whites whose mortality presumably exceeded 
that of the white population. Application of the 1940 mortality rates of the foreign born (Linder and 
Groves, op. cit.) to the 1940 alien population gave an estimated mortality approximately 2 per cent 
greater than that obtained by use of the 1930-1939 Life Tables for white males and females. If the 


1940 mortality rates of the foreign born had been used, it would have raised the estimated alien mor- 
tality for the 3}-year period, January 1, 1941 to July 1, 1944, by about 7,000. 
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between registration and July 1, 1941," and a loss of approximately 
90,000 in each fiscal year thereafter (Table 2, Col]. 1). Of the original 
registrants over 410,000 are thus estimated to have died by mid- 
year of 1945. A correction of this mortality estimate is necessary, how- 
TABLE 2 
ESTIMATED ALIEN MORTALITY, JANUARY 1, 1941-JULY 1, 1945 














Estimated Adjustments for Corrected 
Mortality Migration and Estimate 
Naturalization 
1 2 3 

January to July 1941 50,744 403 50,341 
Fiscal year 1942 87 ,876 3,012 84 , 864 
Fiscal year 1943 89 ,413 6,274 83 ,139 
Fiscal year 1944 90 , 966 10,925 80 ,041 
Fiscal year 1945 92 ,450 16 ,650 75,800 
Total 411,449 37 , 264 374,185 





ever, for the alien population was increased by new arrivals after 1940, 
and many of the original registrants had become naturalized or had left 
the United States. Adjustment of the mortality estimate to allow for 
migration and naturalization changes in the alien population after 
1940 (Table 2, Col. 2) gives a corrected estimate of approximately 
375,000 alien deaths up to July 1, 1945." 

From the foregoing data, the preliminary estimate of the resident 
alien population of the continental United States on July 1, 1945 is 
derived as follows: 


Se NE CN sc nccansdaceenseseeeead oe nee tennes 4,889,770 
Net decrease, migration and naturalization.... 1,302,049 
Estimated deaths to July 1, 1945............. 374,185 

Total Boewensn bo Gey 1, BOGS. oon sciccvicecssesesesesvenss 1 ,676 ,234 


Estimated alien population, July 1, 1945..............-065- 3,213 ,536 

Further correction of the estimate is needed, however, since official 
statistics of migration and naturalization, although covering the 
greater part of the changes in the alien population of the United States, 
do not include all changes in alien residence and citizenship status. 
Additional sources of change in the size of the alien population are 
noted briefly below, together with estimated or actual totals for the 

1! The average interval from registration to July 1, 1941 was estimated to be seven months, inas- 
much as the bulk of the alien registrations came late in the registration period. 


12 The correction is made by taking the net change (immigration minus emigration minus natural- 
ization) in each sex and age group for each fiscal year and “surviving” up to July 1, 1945. 
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fiscal years 1941 to 1945 inclusive. For completeness, changes affecting 
the temporary as well as the resident alien population of the United 
States are listed. 


FURTHER ADDITIONS TO THE ALIEN POPULATION 


1. Nonimmigrants. As noted above, certain classes of aliens ad- 
mitted to the United States for a temporary period are reported as non- 
immigrants. Admissions of this category, including returning alien 
residents of the United States, were as follows for the fiscal years 1941 
to 1945 inclusive: 


Fiscal Year Nonimmigrants 
1941 100 ,008 
1942 82,457 
1943 81,117 
1944 113,641 
1945 164,247 


2. Local border crossers. Aliens who cross the land borders of the 
United States periodically and who are in possession of border-crossing 
cards are included in this class. For recent years the number of such in- 
dividuals has been reported as foliows: 


— Total Active Iniermittent 
Alien Crossers Alien Crossers Alien Crossers 
1941 280 , 387 77,751 202 , 636 
1942 365 , 554 90 , 541 275,013 
1943 391,510 109 ,975 281 ,535 
1944 439 ,295 114,321 324 ,974 
1945 379 ,355 116,798 262 , 557 


The above figures give the number of individuals, not the number of 
times they crossed the border in a given fiscal year. Admissions in this 
category, counting each separate crossing, were approximately 
20,000,000 per annum in recent years and rose to over 27,000,000 dur- 
ing the fiscal year 1945. The number of departures of this class is not 
reported. 

3. Alien seamen arrivals. Reported neither as immigrants or non- 
immigrants are alien seamen admitted temporarily in pursuit of their 
calling.” Arrivals and departures of this class in recent years have been 


as follows: 


13 Sec. 3(5), Immigration Act of 1924. 
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, . Seamen 
Pisces Sear Arrivals Departures 
1941 696 ,016 687 ,918 
1942 628 ,018 622 , 557 
1943 564 , 258 571,180 
1944 748 ,423 758 ,450 
1945 768 ,921 770 ,993 


It should be pointed out that the above figures are for alien seamen 
arrivals and not admissions. The number of seamen actually admitted 
is not reported. 

4. Temporary admission of laborers. During World War II alien 
laborers were admitted for temporary employment in the United 
States. The admissions were for the most part agricultural laborers 
and railway track workers, but Canadian woodsmen and a few other 
special categories of temporary labor were also admitted.“ Such ad- 
missions were included in the nonimmigrant totals, with the exception 
of Mexican laborers who came in under special agreement with the 
Mexican government and who formed the majority of the temporary 
laborers. As of June 30, 1944 the number of alien laborers in the United 
States for temporary employment was officially reported as 82,098 
agricultural laborers from Mexico, Jamaica, Barbados, the Bahamas, 
and Newfoundland. An additional 48,024 Mexican railway workers had 
been admitted during the preceding twelve months. From Canada 
11,326 woodsmen had entered the United States, but rapid turnover 
kept the number present at any one time at approximately 3,000. The 
admission of smaller numbers of Chinese cooks and cook’s helpers from 
Mexico, and of Newfoundland laborers for employment in mica and 
copper mining had also been authorized. On June 30, 1945 there re- 
mained in the United States 99,434 agricultural laborers, 64,999 rail- 
road track workers, and 17,333 other temporary laborers employed in 
industries and services essential to the war effort. Included within this 
group of 181,757 temporary laborers were over 135,000 Mexicans. 

5. Excluded aliens paroled into the United States. Included in this 
category are aliens seeking admission to the United States for hospital- 
ization or treatment who are technically excludable by reason of their 
illness or defect. Other paroled aliens are legal residents returning to the 
United States not in possession of the documents required for ad- 


4 The admissions of temporary laborers were under the 4th and 9th provisos of Sec. 3, Immigra- 
tion Act of February 5, 1917; Public Law 45, April 29, 1943; and Title I, Public Law 229, February 
14, 1944. The admissions were under the authority of the War Food Administration and the War Labor 
Board. 
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mission. In official reports such cases would be classed as exclusions; 
full statistics on the number of aliens paroled into the United States 
are not kept. An alien may be paroled either by the Inspector-in- 
Charge at a port or border station, by a local Board of Special In- 
quiry, by the Central Office of the Immigration and Naturalization 
Service, or by the Board of Immigration Appeals. A sample study of 
exclusion cases before the Board of Immigration Appeals over a period 
of six months has shown that of 685 such cases, 192 were paroled into 
the United States. Since this study dealt with appealed cases only, 
it does not afford a basis for estimating the total number of paroles. 

6. Interned aliens. Pursuant to agreements reached between the 
United States and its cobelligerent western hemisphere countries, the 
Departments of State, War and Justice arranged to bring to the United 
States from Central and South America and the West Indies all 
enemy aliens who were to be interned as such during the war. Such 
admissions were not made pursuant to the immigration laws and are 
not included in the reported statistics of immigration. At the end of the 
fiscal year 1944 (June 30) there were 1,439 aliens of this category in 
the United States.“ One year later the number was 1,956. 

7. Prisoners of war. As in the case of the interned aliens mentioned 
above, prisoners of war are not admitted under the immigration laws 
of the United States and are not included in the official statistics of 
immigration. The size of this group has not been officially reported 
but was said to have included 350,000 German prisoners of war in the 
spring of 1945.16 

8. Refugees admitted under Presidential directive. The approximately 
1,000 aliens of the Oswego (Ft. Ontario) refugee group who entered 
the United States in August of 1944 under Presidential directive were not 
formally admitted and hence did not appear in immigration statistics. 
On their formal admission under Presidential directive of December 22, 
1945, they were duly recorded as immigrants to the United States." 

9. Foreign military and naval personnel. This class, admitted under 
international courtesy, is not inspected under the immigration laws but 
is permitted to enter temporarily. No figures are available on the num- 
ber of aliens who so enter the United States. Aliens of this class who 


15 United States Department of Justice, Immigration and Naturalization Service, Monthly Review 
December 1944, p. 74. 

16 New York Times, May 19, 1945, p. 1. 

17 Of the 924 aliens in the refugee camp at its termination, about 25 are reported to have asked for 
repatriation or to have sought admission to other countries; 19 or 20 were found excludable because 
of mental or physical defect but were admitted temporarily under the 9th proviso, Sec. 3, Immigration 
Act of 1917; and the remainder were admitted for permanent residence. Common Council for American 
Unity, Interpreter Releases, Vol. XXIII, No. 7, February 28, 1946. 
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fail to maintain the status of military or naval personnel are subject 
to deportation on the charge of no visa. 

10. Extradited aliens. Extradition may occur under state as well 
as Federal jurisdiction. Extradited aliens are generally treated as ex- 
cluded aliens paroled into the United States. Their number is not 
known but is presumably small. 

11. Aliens deported to the United States. The number of such aliens 
entering the United States is not known since the available figures do 
not separate these aliens from American citizens deported to the 
United States. As with the preceding class, the number is presumably 
small. 

12. Arrivals from outlying possessions. The alien population of con- 
tinental United States is further increased by the entry of aliens previ- 
ously admitted to one of the outlying possessions. During the fiscal year 
1945 aliens to the number of 1,722 departed from insular possessions to 
continental United States. Such movements during the preceding years 
were not recorded in comparable form, but the number of aliens in- 
volved was small. 

13. Illegal entrants. The list of sources of increase in the alien popula- 
tion of the United States would not be complete without reference to 
illegal entrants, but accurate estimate of their numbers cannot be made. 
In 1934 and 1935 it was estimated by the Commissioner of Immigra- 
tion that the number of aliens subject to deportation for illegal entry 
was “probably less than 100,000.”!* In June 1946 it was officially esti- 
mated that 60,000 aliens had entered the United States illegally during 
the current year.'® More recently the Immigration and Naturalization 
Service estimated that aliens were still entering the United States 
illegally at the rate of more than 10,000 per month and reported that 
nearly 14,000 aliens had been arrested for illegal entry within a single 
month.?° 

14. Aliens violating terms of admission. In addition to aliens who 
enter illegally there are legally admitted aliens who violate the terms of 
admission and thereby become subject to deportation. For the purposes 
of the present estimate this means that aliens admitted for temporary 
residence may in fact constitute additions to the resident alien popula- 
tion. Included in this category are those who enter as nonimmigrants 
or nonstatistical aliens and who overstay their period of admission 
without extension of the term of stay; students, employees of foreign 

18 Congressional Record, August 22, 1935, and May 28, 1936. 

1° New York Times, June 3, 1946, p. 44. 


2° United States Department of Justice, Immigration and Naturalization Service, Form G-23, 
August 1946. 
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governments, and others who fail to maintain the status under which 
they are admitted; and deserting alien seamen. The number of aliens 
who violate the terms of admission and who fail to adjust their status 
is not known, but alien seamen desertions are reported. For recent 
years the number of such cases has been reported as follows: 


Fiscal Year Deserting Alien Seamen 
1941 4,661 
1942 6,987 
1943 5,683 
1944 5,811 
1945 5,577 


15. Denaturalization. The alien population may also be increased 
through loss of citizenship whereby the status of residents of the 
United States is changed from citizen to alien. Loss of citizenship is for 
the most part through the process of denaturalization, by court action 
revoking a naturalization obtained through fraud or illegality. The 
number of denaturalizations in recent years is given below: 


Fiscal Year Denaturalizations 
1941 1,055 
1942 640 
1943 378 
1944 238 
1945 165 


16. Expatriation. Loss of citizenship may also take place through 
expatriation. Under the Nationality Act of 1940 expatriation may be 
effected in the United States by reason of desertion from the United 
States military or naval service in time of war; treason, etc.; or formal 
written renunciation of citizenship.” Under the latter provision for ex- 
patriation which became effective July 1, 1944, United States citizens 
of Japanese ancestry were enabled to renounce their citizenship. It is 
estimated that approximately 5,000 persons, for the most part of 
Japanese ancestry, renounced citizenship during the fiscal year 1945.” 


FURTHER DEDUCTIONS FROM THE ALIEN POPULATION 


17. Nonemigrants. Corresponding to the nonimmigrant class of 
admissions is the nonemigrant class of departures. Departures of this 
class for the fiscal years 1941 to 1945 inclusive were: 


21 Sections 401(g), 401(h), and 401(i) of the Act. 
22 United States Department of Justice, Immigration and Naturalization Service, 1945 Annual 


Report, p. 24. 
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Fiscal Year Nonemigrants 
1941 71,362 
1942 67 ,189 
1943 53,615 
1944 78,740 
1945 85,920 
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18. Alien seamen departures. See under Alien Seamen Arrivals 


(No. 3). 


19. Deportations and voluntary departures. Alien deportations and 


voluntary departures arising from deportation proceedings under 
Section 19 of the Immigration Act of 1917 are not included either as 
emigrants or nonemigrants in the official statistics of alien departures 
from the United States. The numbers in these classes during recent 
years have been as listed below: 


Fiscal , Voluntary 
Year Depertations Departures 
1941 4,407 6,531 
1942 3,709 6 ,904 
1943 4,207 11,947 
1944 7,197 32,270 
1945 11,270 69 ,490 


20. Removal of indigent aliens and Filipinos. Indigent aliens*® and 
Filipinos™ may be returned to their native land at their own request 
and at government expense. Persons so removed from the United 
States are not included in the totals of alien emigrants or nonemi- 
grants. Alien departures of this class have been as follows in the fiscal 
years 1941 to 1945 inclusive: 


Fiscal a Indigent 
Years Filipinos Aliens 
1941 134 152 
1942 0 30 
1943 0 5 
1944 0 4 
1945 0 12 





21. Aliens serving abroad in the armed forces. Figures are not avail- 
able for this group. Although physically absent, such aliens are re- 
garded legally as maintaining their residence in the United States. 


23 Section 23, Immigration Act of 1917 (39 Stat. 892). 
% Section 1, Act of July 27, 1939 as amended (53 Stat. 1133). 
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22. Aliens extradited by foreign countries. Information is not avail- 
able on the size of this group, which is not included in the reports of 
alien departures. The number is believed to be small. 

23. Departures to outlying possessions. During the fiscal year 1945, 
1,566 aliens departed from the United States to insular possessions. 
Comparable figures are not available for earlier years, but the volume 
of movement during 1941 to 1944 was small. 

24. Illegal and nonrecorded emigration. Aliens who leave the United 
States at points other than border stations or ports do so illegally, and 
there is no means of estimating the number of such departures. In 
addition, because of the limited inspection of departures over the land 
borders, alien departures of the emigrant and other classes may go un- 
recorded. 

25. Derivative citizenship. Under the provisions of the Nationality 
Act of 1940 alien children under eighteen years of age and residing in 
the United States derive citizenship through the naturalization of both 
parents or, under certain conditions, on naturalization of one parent.” 
Such derivative citizenships constitute reductions in the alien popula- 
tion of the United States, but are not included in the statistics of 
naturalization and are not elsewhere reported. A sample study of 
5,000 recent naturalizations showed only 10 children to have derived 
citizenship. Application of this ratio gives the following estimate of 
derivative citizenships during the fiscal years 1941 to 1945: 


Fiscal Estimated 
Year Derivatives 
1941 554 
1942 541 
1943 635 
1944 871 
1945 463 


ADJUSTED ESTIMATE OF THE ALIEN POPULATION 


The volume and diversity of alien movement into and out of the 
United States and of changes of citizenship status within the country 
are such as to make impossible a full accounting for year to year 
changes in the alien population. During the fiscal year 1945, for exam- 
ple, not a year of exceptionally heavy travel, the inward movement 
over the land borders exceeded fifty-five million individual crossings, 
of which one-half were by aliens. The volume of outward movement 


25 Sections 313 and 314 of the Act. (54 Stat. 1145-1146). 
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over the land borders was presumably of the same order. During the 
same year more than 1,100,000 persons—citizens and aliens, passengers 
and seamen combined—arrived by vessel at the seaports of the United 
States. Approximately 400,000 more came by air; and the illegal and 
unrecorded entries were of unknown but presumably considerable 
numbers. In addition to the volume of movement, the many classes of 
arrival, departure, and change of citizenship status make more diffi- 
cult the problem of estimating the number of aliens in the United 
States. 

On the basis of immigration, emigration, naturalization, and esti- 
mated mortality only, the resident alien population of continental 
United States, which stood at approximately 4,890,000 at the end of 
1940, was reduced to about 3,200,000 at July 1, 1945. This estimate, 
however, does not take into account the many classes of alien move- 
ment not included within the reported totals of immigration, emigra- 
tion, and naturalization. Twenty-five such classes have been enumer- 
ated above. For the most part these special classes affect only the 
temporary resident and de facto alien population. Some, however, 
represent permanent or semi-permanent changes of residence or citizen- 
ship status. 

The reported totals of such permanent and semi-permanent changes 
between January 1, 1940 and July 1, 1945 (excluding those for which 
no reliable figures are available) were as follows:* 


Increase 
a ates nian we aes ate ee eticie 1,949 
ta ee a i a i aa i cial be ai 5 ,000* 
Arrivals from outlying possessions................ 1,722 
cs kh ee RRK RRS RAN Renee 8 ,671 
Decrease 
Nos nike 4:6 se aiacew mame eda a Ae 28 , 569 
ee 123 ,877 
Removal of indigent aliens and Filipinos. ......... 194 
CE. . 3 da aw ed aemwewaoenae 2 ,787* 
Departures to outlying possessions............... 1,566 
pO Ee ee ee ee eee eee ae ee 156,993 
ER Ee, ee ee eee een eee 148 ,322 


* Estimate. 


% Taking one-half of the fiscal year 1941 total for the six-month period January 1 to July 1, 1941 
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Modification of the alien population estimate to take into account 
the net decrease from these special classes would reduce the estimate 
for July 1 to about 3,050,000. There is question, however, whether the 
total number of deportations and of voluntary departures arising out 
of deportation proceedings should be subtracted from the estimate. 
Deportations and voluntary departures from 1941 to 1945 account 
for a gross decrease of over 152,000 in the alien population of the 
United States, but a considerable proportion of the aliens in these two 
classes had entered the United States illegally or had violated the 
terms of their admission. In other words, their arrival would not have 
added to the estimated resident alien population; therefore, to sub- 
tract their departures would lead to a cumulative error of under- 
estimating the number of aliens in the United States. 

In the fiscal year 1944, out of the 8,829 deportations, 4,996? were 
for illegal entry or violation of the terms of temporary admission. In 
1945 such cases contributed 10,029 of the 11,270 deportations.” 
Voluntary departures are not similarly classified, but the proportion of 
cases of illegal residence can be assumed to be at least as high as among 
the deportations. On the basis of the two-year record of cause of de- 
portation it can be estimated roughly that approximately 114,000 of 
the 152,000 deportations and voluntary departures from 1941 to 1945 
were aliens who had entered illegally or had violated the terms of tem- 
porary admission. With this adjustment the estimate of the net de- 
crease attributable to the special classes of change in the alien popula- 
tion between January 1, 1941 and July 1, 1945 is approximately 35,000. 
This brings the estimated resident alien population of continental 
United States on July 1, 1945 well below 3,20C,000. Because of the 
necessarily approximate character of the estimate it is best stated in 
round numbers as between 3,100,000 and 3,200,000. This figure does 
not include aliens temporarily in the United States, nor does it include 
any estimate of illegal entries. 

27 Grounds for deportation: previously debarred or deported, 1,000; remained longer than author- 
ized, 702; entered without valid visa, 3,294. 

28 Previously debarred or deported, 1,529; remained longer than authorized, 793; entered without 


proper documents, 637; abandoned status of admission, 64; entered without proper inspection or by 
false statements, 7,006. 
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ON THE USE OF SOVIET STATISTICS' 


Harry SCHWARTZ 
Syracuse University 


Unavailability of most Soviet data in recent years and the 
difficulties of properly interpreting available Soviet statistics 
complicate greatly all study of the USSR. Misinterpretations 
sometimes result from the propagandistic emphasis in Rus- 
sian statistical presentation, while frequently American un- 
familiarity with Soviet statistical concepts and institutions is 
responsible. A recent Russian publication defines precisely the 
concepts and data collection techniques used by Soviet sta- 
tisticians. This book should be consulted by those working in 
this field. 


71TH the increased importance of the USSR in the postwar world, 
VW statisticians and economists in this country have been forced to 
pay much more attention to the Soviet Union than formerly. Not only 
has this been true with respect to international political, military, and 
economic relations, but most aspects of the domestic life of the USSR 
are now being followed more carefully, and comparisons between the 
Soviet Union and other countries are being drawn more frequently. 

Study of Soviet civilization, particularly its economic aspects, must 
necessarily be based in large part upon available statistics. Unfortu- 
nately, however, many important daia are unavailable and the relia- 
bility of much of the published statistical information issued by the 
USSR’s government has often been questioned. 

The gaps in published Soviet data are most serious for recent years, 
while there is a great abundance of figures for most of the period be- 
tween World Wars I and II. During the 1920’s and early 1930’s the 
Soviet government published very complete and detailed data, usually 
in large annual handbooks corresponding to our Statistical Abstract. But 
even as early as 1931, publication of some significant information, such 
as certain types of price, wage and income data had ceased. After 1936, 
the overall quantity of published data declined sharply. With the out- 
break of war in 1939, the Soviet government shut off almost entirely the 
already greatly reduced stream of statistical information. This nearly 
complete blackout of figures has continued to the present. Most data 
available for the past seven years are fragmentary and do not cover 
many important sectors of life in the USSR. Judgments on the current 


! The research for and the writing of this paper were made possible by a Demobilization Award 
of the Social Science Research Council. 
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Soviet scene therefore, insofar as they depend upon information of a 
statistical nature, must be made with great caution and reserve.? 

For those with longer range perspective, a very significant question 
is that of the reliability of the great mass of Soviet data published before 
World War II. Here views differ sharply. Thus L. E. Hubbard asserts: 
“Whatever statistics may be circulated confidentially among the leaders 
and chief officials in Government and Party, the published statistics 
today are neither objective nor reliable.”* E. C. Ropes has written: 
“Students of statistics do not yet place complete faith in Soviet figures 
as published . . . ”4 On the other hand, Alexander Baykov has affirmed: 
“IT do not share the view that Soviet statistical and other sources are 
less reliable than those published in other countries. On the contrary, 
systematic study over a number of years has convinced me they can be 
used to analyze the economic processes and the economic system of the 
USSR with the same degree of confidence as similar sources published 
in other countries.”® 

Despite these differences among leading specialists, this essay will 
attempt to evaluate the over-all reliability of Soviet statistics, and to 
suggest some cautions which must be observed by users of these data if 
they are to reach valid judgments. 

Attention may be directed first to the hint in Hubbard’s statement 
above that the Soviet regime may keep a double set of books, using : 
correct set for internal administration and a false set for publication, 
the latter being designed for propaganda. The writer believes this view 
to be incorrect. Where the Soviet government has sought to hide things 
from the world, it has simply stopped publishing the relevant data, 
something it would not have had to do if it were keeping two sets of 
books. Furthermore, technical discussions in the administrative jour- 
nals of various commissariats employ published data, or data recon- 
cilable with published figures. Finally, it is perfectly possible to employ 
published Soviet data to reach conclusions highly displeasing to par- 
tisans of the Soviet regime. 


2 The most important sources of comprehensive data issued in the past half decade have been 
N. A. Voznesensky’s speech of Feb. 18, 1941, published in the Soviet press, and the Fourth Five Year 
Plan, which, though it contains primarily Soviet goals for 1950, provides much valuable insight into the 
USSR’'s war and immediate postwar situation. Soviet newspapers also publish, from time to time, 
isolated figures which can be usefully employed. The annual Soviet budgets are also important sources 
of data. 

3 L. E. Hubbard, Soviet Trade and Distribution, Macmillan, London, 1938, p. 370. 

4E. C. Ropes, “The Statistical Publications of the USSR,” The Russian Review, 
November, 1941, p. 125. This article is an excellent summary of the chief primary sources of Soviet 


Vol. 1, No. 1, 


data. 
5 A. Baykov, The Development of the Soviet Economic System, Cambridge University Press, Cam- 


bridge, 1946, p. xiv. 
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Nevertheless, people working with Russian data often find contra- 
dictory figures in the same or different sources, and sometimes believe 
this is evidence of statistical fabrication. This is unlikely since a fabri- 
cated set of data could probably be kept quite consistent. Contradic- 
tions, after all, are also sometimes found in American data, arising 
sometimes between preliminary and final data, between figures bearing 
similar labels but having different definitions or areas of coverage, and 
between statistics on the same subject collected by different agencies 
employing different procedures. The same situation probably prevails 
with respect to Soviet data, but the contradictions there are more diffi- 
cult to resolve because Americans have almost no access to Soviet sta- 
tisticians. It is probably true, however, that Soviet data in general have 
a greater margin of error than American data since the USSR has had 
less time to build a trained corps of statisticians and enumerators; the 
physical difficulties to be overcome in the USSR are greater; and there 
are certain motives for false reporting in the USSR which are less fre- 
quent in the United States.® 

Even though credence is given to Soviet statistics, users of such data 
can easily draw wrong conclusions from them. Persons unable to read 
Russian are particularly vulnerable since they must consult secondary 
sources, but even individuals able to use primary sources may mis- 
interpret published figures if they are not experienced in their use. 

Perhaps the most frequent reason for misinterpretation of Soviet 
data is the usual effort of the USSR’s regime to depict its domestic 
situation in the best possible light. Some statistical handbooks have 
been issued frankly as sources for propagandists. They emphasize the 
achievements attained under Lenin and Stalin, and omit or give little 
prominence to data showing unfavorable facts. For every major piece 
of research, therefore, one must attempt an independent and compre- 
hensive marshaling of the data from the basic primary sources. Other- 
wise one runs the risk of being mislead by propagandistically presented 
data. 

Errors frequently arise too because of the differences in the defini- 
tions employed for some important concepts by Soviet statisticians as 
compared with the definitions employed in this country. Moreover, 
from time to time Soviet statisticians have changed their definitions, 
making later data seriously non-comparable with earlier figures bearing 


6 The Soviet press recently gave prominence to convictions of industrial and agricultural officials 
who falsified production figures in order to obtain bonuses and to avoid punishment for “sabotage.” 
Some of the physical difficulties Soviet statisticians have to overcome are interestingly described by 
Rose Somerville in “Counting Noses in the Soviet Union: The 1939 Census,” The American Quarterly 
on the Soviet Union, Nov. 1940, pp. 51-73. 
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the same title. This situation is exacerbated by the fact that most Soviet 
statistical publications contain little or no information regarding the 
definitions of their data, nor regarding the methods of collection em- 
ployed. 

The Soviet definition of national income is probably the best known 
instance of non-correspondence between Soviet and American statisti- 
cal conventions. The Soviet concept does not include income produced 
by services such as passenger transportation and private civilian com- 
munications. Income produced by the professional work of doctors, 
teachers and others not producing material goods is also excluded, and 
the same is true of rents of living quarters. Colin Clark has estimated 
that about 5 percent must be added to Soviet national income figures 
to make them comparable with the totals based on the English defi- 
nition, but this may be an underestimate.’ 

More serious as a source of erroneous judgment is the method of 
valuation used in calculating Soviet national income. All items included 
are valued at so-called 1926/27 prices. This practice results in a signifi- 
cant upward bias in the published data for the years since 1929. This 
bias is the result of the sharp change in the composition of Soviet output 
since the base period. In 1926/27 commodities such as tractors, loco- 
motives, organic chemicals, rare metals, etc. were either not produced 
in the USSR at all or were produced in small numbers or quantity. As- 
signing 1926/27 prices to such items—actual prices for commodities 
which were produced in the USSR then, and equivalent prices for com- 
modities not produced in the USSR till after 1926/27—means valuing 
such items relatively highly. After the base period, particularly since 
the five year plans began, the USSR has become much more highly 
industrialized and these commodities have had their output multiplied 
many times while unit costs have declined considerably. Valuing these 
industrial products in 1926/27 prices for a year like 1935 or 1939 gives 
them an importance in Soviet output far disproportionate to their im- 
portance by any other measure of contribution. This bias obviously 
gets worse with the passage of time and the continued expansion of 
industrial production. Anyone using the Soviet income data as a meas- 
ure of the increase of real income in the USSR can very easily over- 
estimate the income growth over the past two decades if allowance is 
not made for this inherent bias. 

The probiems arising from changing Soviet definitions of the same 
concept may be illustrated by reference to grain yield data. Up to 1933, 
Soviet yields were reported on a barn crop basis, i.e. after allowing for 


7 Colin Clark, A Critique of Russian Statistics, Macmillan, London, 1939, pp. 5-6. 
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actual losses during the entire harvesting operation. In 1933, yields 
began to be reported on the basis of standing crop in the fields minus 
10 per cent, a figure which probably underestimated the extent of all 
harvesting losses.* Some time since 1933, without any public notice 
being drawn to the fact, at least so far as several very close students of 
Soviet agriculture have been able to determine, Soviet grain yields be- 
gan to be reported on the basis of the standing crop alone, with no de- 
duction for any losses. The adoption of this new convention, so far as 
this writer knows, was not announced until reference was made to it 
as established procedure in a book published in 1944. Many foreign 
students of Soviet agriculture, not knowing of this change, may easily 
have reached some incorrect conclusions by making comparisons be- 
tween yields reported on the earlier and later definitions. 

The changing area of the USSR, particularly in recent years when 
about a quarter of a million square miles have been added to Soviet 
territory, is a further statistical complication. Soviet comparisons over 
time do not always allow for this changing area, and the unwary reader 
may implicitly assume that a constant area is being discussed.'® 

Since most Americans are not too familiar with the planned economy 
of the USSR, errors sometimes result because Soviet data on future 
goals are quoted as if they were measures of accomplished fact. Many 
such goals have not been achieved by the expected time, however, so 
that such use of goal data may be very misleading. This error is fostered 
by a peculiarity of Russian grammar which permits the expression of a 
future meaning by what is nominally a present tense form, resulting 
sometimes in English present tense translations where the future tense 
should have been employed. It should also be borne in mind that some 
important figures in the plans of the 1930’s were only rough estimates 
based upon little or no prior enumeration. A striking example is pro- 
vided by the Second Five Year Plan. This estimated the USSR’s popu- 
lation as 165.7 million persons at the end of 1932 and set 180.7 millions 
as the expected population at the end of 1937." Actually, the popula- 
tion census of January 1939 found the USSR’s population to be 170.5 
million persons. Until this was released, however, the erroneous data 


8 Izvestiya, September 21, 1933. 

* Tsentralnoye Statisticheskoye Upravleniye Gosplana SSSR, Slovar-Spravochnik po Sotsialno- 
Ekonomicheskoi Statistike (Dictionary-Handbook of Social-Economic Statistics), Gosplanizdat, Moscow 
1944, p. 88. 

10 Cf. for example Voznesensky’s report on the Fourth Five Year Plan, issued in English by the 
Soviet Embassy in Washington. The point mentioned in the text is most noticeable in the discussion 
on farm output,1932-1950, on p. 8. 

1! State Planning Commission of the USSR, The Second Five Year Plan, International Publishers, 
New York, no date, p. 545. 
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given in the Second Five Year Plan had often been employed in studies 
and discussions of Soviet population. 

Errors also arise because of unfamiliarity with the complexities of 
Soviet institutional organization. The writer was once informed, for 
example, that the number of livestock in the USSR had increased be- 
tween 1938 and 1940. This view was supported by data covering live- 
stock on collective farms. But large numbers of livestock are owned 
privately by collective farmers and others. Further, in 1939 a decree 
was issued requiring collective farmers to sell large numbers of their 
privately owned animals to the collective farms. The increase in col- 
lective farm livestock may therefore merely have reflected these pur- 
chases. The behavior of the total livestock population is, of course, 
indeterminate from the evidence offered. 

The hazards cit 1 above are merely some of the chief pitfalls en- 
countered by the user of Soviet data. There can be no guarantees at 
any time that any user of these statistics will not reach erroneous con- 
clusions, regardless of how much experience he may have. An aid in 
minimizing errors, however, is to be found in a publication of the 
USSR’s Central Statistical Administration. This little known volume, 
Slovar-Spravochnik po Sotsialno-Ekonomicheskoi Statistike (Dictionary- 
Handbook of Social-Economic Statistics) contains the definitions used 
for most Soviet statistical series. In addition, there are some details as 
to the methods by which many of these data are collected. At least one 
copy is available in this country, in the Library of Congress. Additional 
copies should be procured for other libraries so that users of Soviet 
statistics may have the valuable aid provided by this unique volume. 

It is to be hoped that with the passage of time there will be oppor- 
tunities for direct contact between American and Soviet statisticians at 
the working level. Such access to the men who actually turn out the 
data will probably provide the most fruitful means of clearing up diffi- 
culties and ambiguities which now hamper the American student of the 
Soviet scene. 
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A FREQUENCY DISTRIBUTION REPRESENTED AS 
THE SUM OF TWO POISSON DISTRIBUTIONS 


Wa .rTer Scuiiimne, M.D.* 
Stanford University School of Medicine, San Francisco 


Certain frequency distributions which resemble a Poisson 
distribution, but not quite closely enough, may sometimes be 
better represented as the sum of two Poissons. A method for 
the dichotomy and subsequent summation is described, and 
twenty-nine distributions to which the Poisson was supposedly 
a good fit, are analyzed. 


INTRODUCTION 


HEORETICALLY, it may be shown that if the probability of occur- 
io of an event is extremely small, but a sufficiently large number 
of independent samples are taken to obtain a number of occurrences, 
this number will be distributed in the Poisson series [1]. Thus this series 
is available for application to any observed discrete distribution where 
a small and constant probability of occurrence is presumably to be 
expected. Furthermore, as the Poisson is completely defined by a single 
parameter, the mean, X, agreement between observation and theory 
may be determined by calculation of the goodness of fit between the 
frequencies observed and those of a theoretical distribution with the 
same mean. If often occurs however, that the discrepancy between 
observation and theory is too large to be accounted for by errors of 
random sampling alone, even when, from a biological or some other 
viewpoint the data might be expected to be distributed in a Poisson 
series. 

Test of the initial hypothesis having thus failed, the most reasonable 
second hypothesis may be one that differs as little as possible from the 
first: either. 

(a) That the distribution is essentially a binomial with a finite and 
relatively low upper limit on observable frequencies (if s?<X), or 

(b) that the distribution is essentially a combination of two Poisson 
distributions (if o2>X). 

Berkson [2] has given an example of choice of the alternatives sug- 
gested for the case o?< X, and Muench [3] has discussed the treatment 
of distributions with o?>X, which may be considered as having arisen 
from combinations of more than a single Poisson under the condition 
that one of the component distributions has m=o?=0. The present 


* This paper is published posthumously. Dr. Schilling died December 16, 1946. 
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paper considers principally a special case of the problem discussed by 
Muench, it being supposed that the distribution arises from a combina- 
tion of only two Poisson series, the mean of neither being assumed. 
The method was developed in ignorance of Muench’s work, has a some- 
what different approach, and in this instance is perhaps simpler to cal- 
culate. To facilitate comparison, the derived formulas will later also 
be given in Muench’s notation. 

The advantages to be derived from any logical method of splitting 
an observed distribution are obvious, for, having determined that a 
distribution is heterogeneous when considered as a whole, it is often 
possible, after separating the data into two parts, to demonstrate the 
homogeneity of each. Moreover, certain distributions lend themselves 
to either of two interpretations, that is, they may be represented as a 
single Poisson or as a combination of two. In such a case, the simpler 
interpretation is the one of choice, unless from a knowledge of the data, 
there is reason to suspect the influence of more than a single contribut- 
ing cause system. In fact, the stimulating of such suspicion is an im- 
portant by-product of the method now to be described, for evidence of 
the existence of a hitherto unsuspected source of divergence fromtheory 
may eventually discover information of value. 


METHOD 
Testing for goodness of fit is essential. The classical frequency test, 
in which x? is determined by comparing the observed and theoretical 
cell frequencies according to the formula 


0-—? 
gD" 


is the one in most common use. When applied to a Poisson distribution, 
two degrees of freedom, one each for N, i.e., the number of cells, and 
for X, are lost, the table of P being entered with N —2. 

However, x? may also be written 


7 > (X — X)? 
¥ 
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x? 





and since 
>> (X — X)? = (n — 1)s? 
,  (n—1)s? 


gf ao <n 


XY 






































.\ TION 


ed by 
bina- 
imed. 
some- 
0 cal- 
also 


tting 
nat a 
often 
e the 
elves 
fas a 
npler 
data, 
ibut- 
1 im- 
ce of 
eory 


test, 
tical 


tion, 
and 


SUMMING POISSON DISTRIBUTIONS 409 


where 7 is the total number in the sample and s? is the best estimate of 
the parameter o*. This procedure will be referred to as the “moment” 
test, and for determining goodness of fit between an observed distribu- 
tion and a single Poisson is more powerful than the frequency test [2]. 
For x? thus determined, the number of degrees of freedom will usually 
be beyond the range of existing tables, and P must be obtained via the 
formula 


r/o = VV 2x? —V2n — 3. 


It is not feasible to use the moment test in comparing an observed 
distribution with one resulting from the summation of two Poissons, 
hence, for that purpose, recourse must be had to the classical test. To 
facilitate comparison however, the probabilities derived by both meth- 
ods will be indicated when dealing with a single Poisson. 

From the observed distribution, fo, determine the total frequency, n, 
the mean X, and the second moment about the mean, o*. Calculate 


n 





and test the significance of the difference between s* and X. If reason- 
able agreement is obtained, say P<0.05, the hypothesis that the ob- 
served distribution is represented by a single Poisson may be accepted, 
but is not proved. If it be desired to test also the hypothesis that the 
observed distribution may be considered a summation of two Poissons, 
determine, in addition, the third moment about the mean, r*, where 


> xX? vy _? 


n 


73 


Let a; and a2 be the means of the desired sub-distributions, and 
and nz the sub-total frequencies associated with each. From the statis- 
tics already derived above, calculate 


te 
o® — X 

h = 4(o? — X) 

k=2X+4 9 


(9? -f. hi? 
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In determining the sub-means a; and a2 and their associated frequen- 
cies, it is convenient to designate the smaller mean always as a;; to this 
end, treat e as positive, and write 


a, = 0.5(k — d) 
0.5(k + d). 


de 
Then 
nm, = n(a2 — X)/d 
mn =n— nN. 
The calculations may be easily checked by the fact that 
N10; + No, = >»  & 


Having thus determined the values of the sub-means and their corre- 
sponding n’s, construct a theoretical Poisson for each (f; and f2), and add 
the frequencies in corresponding cells to obtain the summation, f,. 
Determine the goodness of fit between the observed and summated 
frequencies (fo and f,), remembering that four degrees of freedom are 
now lost, i.e., for n, X, o?, and 7°. 

At least four significant figures should be retained in the determina- 
tion of the constants. For the calculation of the theoretical distribu- 
tions, Molina’s [4] “Table I”, which gives arguments for means: a= 
0.001 to a=100.0 will prove invaluable. In practice it is generally un- 
necessary (unless the mean is very small) to interpolate between tabled 
values. Similarly, the use of whole numbers for the cell frequencies 
comprising the summation is suggested, and the ‘n’ values should be 
rounded to the nearest integer. Such procedure tends to distort the 
value of x? somewhat, but unless P turns out to be on the borderline 
of significance, this error may be disregarded. In what follows, values of 
P in the neighborhood of 0.05 have been recalculated by interpolation 
from means of four decimals, the cell frequencies having been deter- 
mined to the nearest tenth. In Table 1, such recalculations have been 
starred. 

Two hypotheses, (a) that the distribution may be represented by a 
single Poisson, and (b) that it may be represented as the sum of two 
Poissons, have thus been tested, and the result, for the cases cited, will 
fall into one of the three categories defined below, according to the 
value determined for P, thus: 
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Category Defined as Single Poisson Two Poissons 
I Py <0.05, P2<0.05 Untenable Untenable 
IT Py 20.05, P2<Px Tenable Untenable 
III Py sPi, P2>0.05 Untenable Tenable 
P2 >P, 


where Py is the probability for a single Poisson derived from the mo- 
ment test, and P; and P2 the probabilities, respectively, for a single 
Poisson and for two Poissons derived from the frequency test. 

The definitions given above are arbitrary, and while satisfactory for 
classifying the distributions here, may not hold for all cases. The judg- 
ment to be made in doubtful instances should be weighted in favor of 
the simpler hypothesis. 

Furthermore, dichotomy will be impossible in the presence of any 
one of the following restraints: (a) o2<X, (b) d>k, (c) no degrees of 
freedom remain for Pe. 


DERIVATION OF THE ESTIMATION EQUATIONS 


If two frequency distributions, with sub-totals n; and nz, and means, 
variances, and third moments about the mean, 


... é.. 2 














X= XY, 
ny Ne 
aa Za? 
9 Fr 9 9 Fo 
q° = -— — X;? o2? = —— — X-;? 
ny Ne 
ae - ae 
7° = — 3X01" = X;3 T2° = —> “4 22" — X23 
ny No 


be combined into a single distribution, with a total number, n, mean, 
X, variance, o?, and third moment about the mean, r*, we shall have, 
for the first moment 

ny Ne 


X =—Xi+— Xk, (1) 


n n 
for the second moment 


mi | ne. ny No . 
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ny Ne Re on Ne — 
rs = — 7,3 + — 723 + 3 | — X30)? + — X202? 
nr nN nm nm 
_ Ne shi 
= 94. pw ee H. (3) 


n n 


If the component distributions be Poisson distributions, with means 
a, and a2 respectively, then 


X; = o;? = 7;* = a; 


X2 = o2” = T° = do. 
If we write 
Ny Ne 
r=— l-r=— a2 —-a=d 
n n 


and substitute these values in the above equations, we shall have, for 
the first moment 


X =a, —rd (4) 
and for the second moment 
o = r(1 —r)d?+ X. (5) 
The third moment reduces to 
a, — rd + 3r(1 — r)d? — r(1 — r)(1 — 2r)d’ | 
and substituting from (4) and (5) 
r3 = X + 3(0? — X) — (0? — X)(1 — 2r)d. (6) 
From (5) 
r(l—r)d? =o? —X (5a) 
and from (6) 





a — Xx ~ \ 
—1(1 — 2r)d = | _¥Y _ 3| = 9. (6a) 


Multiplying (5a) by 4 and squaring both sides of (6a) give respec- : 
tively 
4rd? — 4r*d? = 4(o2 — X) =h 
d? — 4rd? + 4r*d? = g?. 
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Adding, and taking the square root of both sides, we have 
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d= [g +h)". 


Now the general form of the quadratic 


may also be 


x? —br+c=0 
written 


xz? — (x4, + 22)t + M122 = 0. 


Thus for our purpose, we may write 


From (4) 


and from (9) 


hence 


a? — (a; + a2)a + aia2 = 0. 


a=X+rd 


a, = X —(1-—r)d 


a+a=X—(l-—rd+X+rd 


Also 


2X — (1 —2r)d, and from (6a) 
=2X+g=k. 


aya. = [X — (1 — r)d|[X + rd] 
= ¥? — (1 — 2nd¥ — (1 — na 


| 
4 
+ 
—) 
>| 

| 
. 
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X? + (g +1)X — o°. 


The quadratic may thus be rewritten 


a? — ka + [X?+ (9 + 1)X — o?] =0 


where the roots are 


a =0.5{k Ve —4fe+o0+)Dx —o' J}. 





The expression under the radical reduces to 


4X? + 4gX + g? — 4X? — 4(g + 1)X + 40? = g? + 4(c- — X) 
=g+h=d 


(8) 


(9) 


(10) 


(11) 


(12) 


(13) 
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whence 


a = 0.5|k F d] (14) 


a; being the negative root. 
Furthermore, as (n;/n) =r, we have, from (9) 


nm, = n(a2 — X)/d (15) 
Ng =n— Ny. (16) 


The means a; and az are thus the parameters of the two Poisson dis- 
tributions whose sum will best approximate the observed frequency 
distribution, if the latter can, in fact, be so represented. Also, m; and ng 
are the sub-totals associated with the respective means. 

As has been pointed out, this paper considers a special case of 
Muench’s more general problem. In that portion of his paper dealing 
with Poisson distributions, Muench extends the range of hypotheses for 
consideration to include those involving 

(a) a third (or second) sub-distribution with zero mean and standard 
deviation, and 

(b) a “chain”’ Poisson. 

He does not, however, provide a solution for the specific case which 
is here chiefly considered, namely, that of two Poisson sub-distributions 
in the absence of a third component with zero mean. 

Making use of moment functions (instead of moments about the 
mean) defined as 


yo=l 
Wi= 
Yo =m—- 


v3 = 13 — 3v2 + 2r, 


the special case of a simple dichotomy will result in the following 
normal equations: 


Yvw=1=V+W 
V1 Vmy + Wmyw 
Yo Vmy? + Wmy? 
v3 = Vmy? + Wmy? 


where m=a. 














































(14) 


(15) 
(16) 
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The equations! for the solution then become 


(Wows — ¥i*)a? + (Pie — Yovs)a + (yids — yo?) = 0 


Ww = Ms — av _—— 
aw — ay 
V=1-W 


and it will be noted, on referring to Muench’s paper, that the quadratic 
given above is identical with his, except that each subscript is reduced 
by unity. 


APPLICATIONS 


Thorndike [5] in “Applications of Poisson’s Probability Summation,” 
cites thirty-two distributions to which the Poisson was supposedly a 
good approximation, and these have been most suitable for review in 
the light of the present discussion. Three of the distributions (a5) (b5) 
(cl2) do not lend themselves to analysis because of insufficient data. 
The constants of the remaining twenty-nine appear in Table 1, ar- 
ranged in order of the categories already mentioned. 

The source of the several distributions follow, together with Miss 
Thorndike’s comments.’ Further interpretations, particularly in regard 
to assignable causes in Category ITI, are also given. 


I. Both Hypotheses Untenable: 

(cll) American Telephone & Telegraph Co; 195 Broadway, New 
York City: “This distribution was obtained from a count of the number 
of party-line subscribers listed per page of a large telephone directory.” 

If p is a constant, as seems reasonable to suppose, the distribution 
should be a binomial with n equal to the average number of subscribers 
per page. The data indicate however, that p is not constant. Some pages 
tend to have a concentration of business houses, government, or other 
non-party line subscribers. But as the degree of such concentration is 
variable, a “chain” binomial (which is beyond the scope of this paper) 
might afford a better fit. 


II. Single Poisson Only: 


(al)(a2) Ernest Rutherford and Hans Geiger: “The Probability Vari- 
ations in the Distributions of Alpha Particles,” Phil. Mag., Vol. 20, 


11 am indebted to Professor George Polya of Stanford University, for the derivation of these 
equations. 

2 In reproducing the comments, indicated by quotation marks, the present author has introduced 
occasional slight changes in wording. Editor. 
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698-707, October 1910. “They observed the collision, with a small 
screen, Of an alpha particle emitted from a small bar of polonium 
placed at a short distance from the screen. The number of such col- 
lisions in each of 2608 eighth-minute intervals was recorded, and the 
distance between bar and screen was gradually decreased so as to com- 
pensate for the decay of the radioactive substance. From this record 
two frequency distributions were calculated, that of the number of 
particles striking the screen in an eighth-minute interval, and in a 
quarter-minute interval. These are distributions (al) and (a2) respec- 
tively.” ; 

(a3) Lucy Whitaker: “On the Poisson Law of Small Numbers,” 
Biometrika, Vol. 10, pp 36-71, 1914. “This is based on a count of the 
number of death notices of men 85 years of age and over, appearing in 
the London ‘Times’ on each day for three consecutive years, 1910-12.” 
See also (a4). 

(b1)(b2) “Student”; “On the Error of Counting with a Haemocy- 
tometer,” Biometrika, Vol. 5, pp 351-360, 1906-07. “This shows the 
results obtained from two different solutions of yeast cells by counting 
the number of cells per square of a haemocytometer slide on which the 
solution had been spread as uniformly as possible after it had been 
thoroughly shaken to break up any clumps of cells.” 

(b4) W. W. Duffield: Logarithms, Their Nature, Computation, and 
Uses, Washington, 1897. “Shows the result of a count of the number 
of times the number ‘12’ appeared as the last two digits of a ten-place 
logarithm in a sample consisting of a column of 100 logarithms in Duf- 
field’s tables.” 

Assuming ‘12’ as likely as any other combination of two digits, the 
theoretical mean equals 1. On this theory, a binomial with n= 100 is 
to be expected, with o? = (pq)n =0.99. Observed mean is 0.842, observed 
standard deviation is 0.913. 

(b6) Telephone Company: “This example was taken from telephone 
company records of local service observations. A sample consisted of 
the calls observed at one central office in one month, and the series of 
samples used was selected from a complete record for all the central 
offices in a large city, by the requirement that the number of calls per 
sample be not less than 450 or more than 550. Distribution (b6) was 
obtained for the number of incorrect reports.” 

(cl) L. von Bortkewitsch: Das Geseiz der kleinen Zahlen, Leipzig, 
1898, “This is the classic example of the Poisson exponential. Bortke- 
witsch found from the records of the Prussian army, the number of men 
killed by the kick of a horse in each of 14 corps in each of 20 successive 
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years, and, after discarding the records for 4 corps which were con- 
siderably larger than the others, treated the rest as one series of sam- 
ples.” 

(c2) Telephone Company: “This distribution is similar to (b4) except 
that the samples of 100 two-place numbers were obtained from several 
different sources, logarithmic tables, trigonometric tables, and numbers 
listed in a telephone directory.” 

(c3)(c4)(c5)(c6) Telephone Company: “These examples show the 
variation in the number of telephone messages recorded per five-minute 
interval for certain groups of coin-box telephones in a large transporta- 
tion terminal. The number of calls registered for each of 23 such tele- 
phones in each of about 20 five-minute intervals between noon and 
2 p.m. was recorded on each of seven days (no Saturdays or Sundays 
included) but as the telephones are arranged in groups, the distribution 
of the number of cal!s per interval was calculated for each group rather 
than for the individual telephones. These shown here are for a group of 
two (c3), a group of four (c4), another group of two (c5), and a group 
of six (c6).” 

(c10) Telephone Company: “This is similar to (b6) except that the 
limits of the number of calls per sample were 515+ 25. The distribution 
represents the number of connections to the wrong number.” 

(d1) (d2) W. W. Duffield: Logarithms, Their Nature, Computation, 
and Uses, Washington, 1897. “These are the same material as (b4), 
differently arranged. In (dl) the 50,000 logarithms used are divided 
into 100 groups of 500 logarithms each. In (d2) they are divided into 
50 groups of 1,000 logarithms each.” 

(d4) (d5) Jean Perrin: Brownian Movement and Molecular Reality, 
London, 1910. “These distributions have been given by Perrin as typi- 
-al of the data obtained when, in order to determine the density of the 
particles of an emulsion at a given depth, he restricted bis field of vision 
to a tiny part of that layer, small enough so that the average number of 
particles visible was only one or two, and then made a large number of 
observations of the number of particles in that space, at regular inter- 
vals. 


III. Two Poissons Only: 

(a4) Lucy Whitaker: “On the Poisson Law of Small Numbers,” Bio- 
metrika, Vol. 10, pp. 36-71, 1914. This distribution is similar to that 
of (a3), but “is based on the number of death notices of women, 80 
years of age and over, in the London ‘Times’ on each day for three 
consecutive years.” 
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In order to demonstrate the method, the complete analysis of this 
distribution follows. See Table 2 and Chart 1. 

















TABLE 2 






































Single Poisson Summation of Two Poissoas 
eneaameats” Ceeainl ts ee a — a j 
x |r | na | nh | ft |nthen| ft | -ave 
0 162 121 x? =(n —1)s?/X 105 58 163 | 162 | 0.006 
1 267 267 115 151 266 267 | 0.004 
2 271 294 63 196 259 271 0.556 
3 185 216 23 170 193 185 | 0.332 
4 111 119 6 ill 117 111 0.308 
5 61 52 1 58 59 61 | 0.068 
6 27 19 1 25 26 27 0.038 
7 8 6 9 913 sii2 | 0.077 
8 3 2 3 3} 3 
9 1 1 1 1 
n 1096 | 1096 | | 314 | 782 | 1906 | 1906 
X =2.2* x? = 2647 a, =1.1* a: =2.6* x? =1.389 
n =1096 n-1=1095 N-4=4 
P, <0.001 P: =0.845 

X =2.1569 o? =2.6051 73 =3 2183 

g ={(r3 —X)/(o?—X)} —3. =1.0614/.4482 —3. = —0.6319 

h =4(o? —X) =1.7928 

k =2X +9 =3.6819 

d =(g?+hA)°5 =1.4806 

a =0.5(k Fd) a: =1.1006 

a: =2.5818 


ni =n(si—X)/d =314.5 


ng=n—ny 


=781. 


5 





* To the nearest tenth. 


The sub-distributions determined by the method probably reflect 
seasonal variation, on the not unreasonable assumption that an in- 
crease in the average number of deaths reported per day might be 
expected during the winter months. But distribution (a3), which gives 
comparable data for deaths of men, eighty-five years of age and over, 
is adequately represented by a single Poisson only. The cause for this 
however, undoubtedly lies in the low mean value associated with the 
latter group. It is apparent, on referring to Table 1, that no distribution 
with mean less than unity could be successfully split, because of the 


appearance of one of the restraints previously imposed. 


(a6) M. Greenwood and J. D. C. White: “A Biometric Study of 
Phagocytosis with Special Reference to the Opsonic Index,” Biometrika, 
Vol. 6, pp 376-401, 1908-09. “This distribution was obtained from a 


count of the number of bacilli in each of 1,000 phagocytes or white 
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blood cells, in the same solution, and as far as possible under the same 
conditions, and is typical of a large number of distributions of the 
number of tubercle bacilli ingested per cell.” 


CHART 1 





300 - 








X=2.2 

n= 1096 _ . 
COMPONENTS OF | 

SUMMATION 7 

100 - 
a; = {1.1 n, = 314 
ao = 2.6 My = 782 \ \ 
3 5 io 
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If the conditions remained constant (and there is no reason for sup- 
posing otherwise), two alternatives present themselves, i.e., that there 
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same were differences in (a) the bacilli or (b) the phagocytes. In the absence 
f the of more detailed information, arguments may be adduced in support of 
either hypothesis. On the one hand, the organisms, which consisted of 
a culture of dead tubercle bacilli, could have been a mixed population 
as regards virulence, in that the individual members of the culture may 
have been in different dissociative phases. On the other hand, the white 
cells could have varied in phagocytic power due to differences in age 
of the individual cells. The original article unfortunately gives no infor- 
mation which would be of interpretative value. 

(b3) Telephone Company; “This example was obtained from the 
records of the ‘lost and found’ office of the Telephone and Telegraph 
Building in New York City. The number of lost articles found in the 
building and turned in to the office on each day except Sundays and 
holidays was recorded and tabulated for the period from November 1, 
1923 to September 30, 1925 inclusive, excluding June, July, and August 
of each year when there might be considerable variations in the popula- 
tion of the building.” 

Commenting further on this distribution, Miss Thorndike says, “the 
average number of articles lost per day might be expected to increase as 
the population of the building increased in this period following the 
completion of an addition.” In view of the dichotomy, however, it seems 
clear that the days fall into two fairly distinct categories, the number 
of lost articles per day in one tending to be about three times as great 
as for days in the other. It is suggested that the difference may be re- 
lated to the weather or to the number of visitors. 

‘b7) Telephone Company: This distribution is for the number of 
cutoffs observed under the same conditions as (b6). 

The observed distribution is definitely not a simple Poisson, further- 
more it also appears doubtful, in view of the low value determined for 
P, (0.071), whether a summation adequately represents the data. On 
the theory that this might be one of Muench’s “chain” Poissons, the 
necessary constants were calculated, giving 


The test for goodness of fit now gives P2=0.183 with five degrees of 
freedom, still not very high, but the nature of the discrepancies tends 
_ to the belief that they arise from sampling fluctuations, a quite tenable 
hypothesis. Search for a better fitting function would appear to be un- 
sup- warranted. 

(b8)(b9) Telephone Company: These distributions are for the num- 
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ber of double connections and the number of calls for the wrong number 
observed, respectively, under the same conditions as (b6). 

(c7)(c8)(c9) Telephone Company: The distributions represent, re- 
spectively, the number of cutoffs, the number of double connections, 
and the number of calls for the wrong number observed under the same 
conditions as (c10). 

In these distributions, the choice of hypothesis is a matter to be de- 

cided by the investigator, who alone is familiar with the conditions 
under which the data were obtained. The fact that a dichotomy is not 
excluded, should however, lead to a re-examination of the data, with 
the object of discovering the possible influences of unrecognized second- 
ary cause systems. 
+ (d3) G. F. Chambers, Handbook of Astronomy; 4th ed. Oxford, 
1889. “This is the number of comets observed per year for the years 
1789 to 1888 inclusive.” Miss Thorndike makes the further comment 
that this, among others, is a case in which there “appears to be a defi- 
nite trend away from the corresponding Poisson distribution” in that 
“the average number of comets observed per year would naturally in- 
crease steadily as a result of the continual improvement in telescopes 
and other instruments” during the period covered by the data. 

A satisfactory fit to the observed distribution was obtained however, 
by dichotomy and summation, which seemed to indicate the presence 
of two distinct parameters, rather than a single parameter defined by a 
moving average. Thus the source was consulted, minor errors being 
discovered in the frequencies of the zeroth and sixth classes. These 
made no appreciable difference in the final result, giving a mean, X, of 
2.64 instead of 2.58. The ten-year means of the corrected data follow: 


Decade 1 2 3 4 5 6 7 8 9 10 
Mean 1.2 1.1 1.1 2.4 #%1.2 3.2 4.1 3.3 3.6 65.2 


and it will be noted that every mean in the first five decades is less 
than any mean in the last five. This would place the point of dichotomy 
at about 50 years from the starting date of the observations, which is 
roughly comparable to the value of 42 years determined theoretically. 
Now, according to hypothesis, the sub-distribution formed by an actual 
count (from the original data) of the number of comets observed during 
the first 42 years should be reasonably fitted by the first theoretical 
sub-distribution. Similarly for the actual and theoretical 58 year sub- 
distributions. Furthermore, the sub-distributions may now be con- 
sidered independent, and, as they are forced to agree only with respect 
to their totals, the table of P is entered with N—1. Thus, 
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n, = 42 Ne =58 
P, a2 P2 


0.025 


a 
Theoretical 0. 
Actual 1 


1 
.45 
where it will be noted that the smaller distribution gives a vaiue for P 
so low that some cause other than random sampling must be accounted 
for. 

To this end, the original data were again consulted, and it was found 
(as stated) that all comets observed during the century had been in- 
cluded in the tabulation, regardless of whether they had appeared more 
than once. Thus, for example, Encke’s comet had been observed and 
counted not less than 24 times, Faye’s, 6, Biela’s, 5, Brorsen’s, 4, etc., 
facts which could surely be expected to affect randomness, as a comet 
once observed has a greatly increased chance of being observed again. 
Therefore a new tabulation was made on the basis of comets observed 
for the first time, i.e., discoveries during the period. Now the ten-year 
means become 


Decade 1 2 3 4 5 6 7 8 8 10 
Mean Lt OS 44.3 19 06 2.5 2.9 2.7 2.3 4.0 


and again the point of dichotomy would appear to be at about 50 years. 
The constants are shown in Table 1 as (d3A), and as before, the results 
fall into Category III. The number of years in the two sub-distributions 
turn out to be 48 and 52 respectively, and reasonable agreement is now 
obtained between these and their actual counterparts. Thus, 


ny = 48 m= 52 

ai P, a 

Theoretical 0.9 ' 3 
Actual 1.14 0.237 2 


2 P; 
0 n 
8 0.340 

The results seem to indicate that in or about the year 1837 (=1789 
+48) some assignable cause leading to a marked increase in discoveries 
arose. That this was indeed the case is verified by R. T. Crawford, 
Professor of Astronomy at the University of California at Berkeley, 
who, in a personal letter states, “The increase in the number of comets 
discovered since 1839 is due principally to the increase in the number of 
telescopes and observers that began to appear about that time.” 

In conclusion, I have to express my appreciation to Professor Hol- 
brook Working of Stanford University for his generous and friendly 
counsel. 
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THE RELATION OF CONTROL CHARTS TO ANALYSIS 
OF VARIANCE AND CHI-SQUARE TESTS 


Henry Scuerrf 
University of California at Los Angeles 


Some connections are established, by elementary and in- 
tuitive paths, among the following statistical techniques: 
Shewhart control charts, analysis of variance, and chi-square 
tests. 


INTRODUCTION 


statistical analysis our viewpoint will be not the modern, rigorous, 
mathematical theory of testing hypotheses, but a more old-fashioned, 
intuitive, elementary approach which may nevertheless offer some in- 
sights. It is not our purpose in this paper to discuss any of the many 
probability problems involved in the theoretical bases of the methods, 
nor to repeat here more complete descriptions, easily available else- 
where, of the current practice of the methods, but, assuming a little 
knowledge of their practice, and introducing just enough description 
to explain the notation, to expose certain relations. The control charts 
we shall consider’ are those for averages, proportion defective, and 
number of defects (%, p, and c-charts). 

The Shewhart control chart technique separates into two distinct, 
but usually related, phases called “control with respect to a given 
standard” and “control with no standard given.” The statistical 
theory for the first case is relatively simple; we consider here the 
theoretically more difficult second case, in which the charts are used to 
decide whether the process behaved as though it had been in a state of 
statistical control during some past period. We suppose that the ob- 
servations have been taken in k “rational subgroups”: This means 
that on the basis of whatever prior knowledge we have of the process we 
try to arrange the whole group of observations into subgroups such that 
observations within the same subgroup are believed to be governed by 
the same set of chance causes, while observations suspected to come 
from different sets of chance causes get into different subgroups. The 
most common basis for subgrouping is time. For simplicity of notation 
we shall assume all the subgroup sizes equal, but we remark that all the 
relations obtained can be generalized to the case of unequal subgroup 
sizes. 


I’ POINTING out some relations among the above three methods of 


1 If an analogue is sought which bears the same relation to the control charts for range and standard 
deviation (R and e-charts) as the method of analysis of variance to 2#-charts, or the chi-square tests to 
the p and c-charts, the analogue would be found to be the L:-test for homogeneity of variance. 
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Let us denote the 7-th subgroup by 
Til, Ti2, Ti, i 5] Vin, (1) 


where 7=1, 2, 3,..., &. For each kind of chart to be considered we 
shall want the foilowing averages: the subgroup averages £;, 


Ei = (Va + Lie + +++ + Lin)/n, (2) 
and the grand average 7 of the whole group, 
Z= (4 + 2+ +++ + F)/k. (3) 


Z-CHARTS AND ANALYSIS OF VARIANCE 


We shall discuss first Z-charts. With these the following further 
statistics are used: either the subgroup ranges R;, where R; is defined 
to be the difference of the largest and the smallest of the numbers in (1), 
or else the (sample) standard deviations o;, defined from 


oi? = [(ta — %:)? + (te — %)? + ~+ + + (tin — %)?]/n. 


If the process were in a state of statistical control, all the observations 
in the whole group would have the same (population) standard devia- 
tion o’; an estimate é of this unknown standard deviation from 
“within” the subgroups is available from any one of the following 
commonly used formulas: the arithmetic average of the subgroup 
standard deviations, 





6 = Clo + on +--+ + 0%)/k, (4) 
or their root-mean-square average, 
6 = CV (ar? + on? +--+ + on) Jk, (5) 
or a certain multiple of the average range, 
é= C(R, + Ro +--+ + Re)/k. (6) 


In each case C denotes some constant (different in each formula), 
usually calculated to make é an unbiased estimate of o’ under certain 
assumptions about the probability distributions of the observations; 
the constant C is of no further interest to us here, since it will in every 
case affect only to the extent of a constant factor the statistic used as a 
criterion by the method of analysis, and hence affect in an obvious way 
what are considered to be the “significant” values of the statistic. 
Present control chart practice is to use (6) when n<15, and (4) other- 
wise, although the analogue of (5) is used in place of (4) when the sub- 
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group sizes are unequal. The “three-sigma” control limits for the 
z-chart are calculated as 
z+ 362, 

where 6;=6,/n, and ¢ is the chosen estimate of o’. We remark that 
if “t-sigma” limits are used with ¢#3, the connections we find with the 
other methods could still be made in a similar way. While the choice 
of tis really not relevant to the present discussion, to make it more con- 
crete we take t=3. 

On a chart with a vertical Z-axis the value of #; is plotted as a func- 
tion of 7, and the control limits are represented as horizontal lines. If 
no &; lies outside the control limits (and if in the judgment of the 
interpreters of the chart there are no obvious trends or cycles revealed), 
the behavior of the Z; is considered consistent with the hypothesis that 
the process is in a state of statistical control. One or more of the 2, will 
fall outside the control limits if and only if the largest in absolute value 
of the k quantities (4;—Z)/é; exceeds 3; this is equivalent to saying 
that the largest of 

(Z; — x)*/62? (7) 
exceeds 9. Two or more of the Z; will fall outside if and only if the 
second largest of the quantities (7) exceeds 9, ete. One authority? pre- 
scribes that “ ...if not more than 1 out of 35 successive points, or 
not more than 2 out of 100, fall outside the three-sigma control limits, 
a high degree of control may ordinarily be assumed for that period.” 
We might thus say that the judgment that the process is in a state of 
statistical control is not accepted if the largest of the quantities (7)—or 
the second largest, or the third largest, depending on the number k of 
subgroups—is suspiciously large, and assignable causes are then 
sought for those subgroups whose points went outside the control 
limits. 

Instead of asking whether the largest (or second largest, etc.) of the 
ratios (7) is suspiciously large we might ask this question about their 
average 





sz/62” (8) 

where 
S, = (@ —%)?+ (® —Z?+-++ + (& — ¥)?)/k. (9) 
The quantity (9) may be regarded as a measure of the spread of the 


? American Standards Association, Control chart method of controlling quality during production, 
American War Standard, Z 1.3-1942, p. 21. 
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subgroup averages Z; based not just on the largest (or second largest, 
etc.) deviation of the z; from the grand average 7 but on the set of 
deviations as a whole. We shall say that (9) measures the “general 
spread” of the Z;. With z-charts, we recall, ¢ is usually calculated from 
the average range formula (6). If instead of this we use the root-mean- 
square formula (5) for ¢, then the ratio (8) is, except for a constant 
factor, exactly the statistic F used as a criterion in the analysis of 
variance test for whether the subgroup averages differ significantly 
among themselves.® 


P-CHARTS AND CHI-SQUARE TESTS 


Next we shall consider p-charts. In this case each article in the sub- 
group is classified as either defective or else non-defective. The observa- 
tions might still be represented by (1), where now each element in (1) 
is equal either to 1 or 0, according as the corresponding article is defec- 
tive or non-defective. The meaning of the subgroup average Z; would 
then be the observed proportion defective in the 7-th subgroup, but we 
shall denote this by p; instead, as is customary. The grand average T 
is usually denoted by #, 


P= (ri + po + +++ + pu)/k. 


If the process were in a state of statistical control, then the complete 
group of kn observations would constitute a random sample from a 
binomial population with (unknown) proportion p’, p would be an un- 
biased estimate of p’, the standard deviation of p; would be 





op = Vp'(l — p’)/n, 
and this, being unknown, would usually be estimated by 
é» = Vp(1 — p)/n. — 
The “three-sigma” control limits on the p-chart are given by 
D + 36>. 


If the lower limit given by this formula is negative it is usually replaced 
by zero. Proceeding as with the #-charts, we may say that the p-charts 
focus our attention on the extreme values of 


(pi — p)?/é,’, 


* For controlling averages, Shewhart called the #-chart “Criterion I” for controlin his book (W. A. 
Shewhart, Economic Control of Quality of Manufactured Product, Van Nostrand Co., 1931, N. Y.). 
It is of interest that his “Criterion II” (page 319) is equivalent to using the F-ratio of the analysis of 
variance: his statistic d/og may be shown to be equal to 1 —F except for a constant factor. 
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whereas if we were interested in the general spread of the p; we might 
again be led to consider the magnitude of 


S,/ép?, (10) 


where 
Sp = [(p. — b)? + (pe — b)? + +++ + (De — B)*)/k. 


The conventional statistical treatment of the question whether the 
observed proportions p; might reasonably be supposed to arise in 
sampling from a common binomial population, is by the chi-square 
test. An “observed” k by 2 table is constructed as in Fig. 1. The table 
shows that in subgroup number 1 there were observed np; defective 
articles and n—np, non-defective, etc. A “theoretical” table (not 
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Subgroup ] No. defective | No. non-defective | Total 
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1 | np | n—npr n 
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FIGURE 1 


shown) is constructed just as in Fig. 1, but all the entries in the “num- 
ber defective” column are np and all the entries in the “number non- 
defective” column are n—njp. For each of the 2k cells the quantity 


(O — T)?/T (11) 


is formed, where O denotes the “observed” entry, and 7, the “theo- 
retical” entry in the cell. The 2k expressions (11) are added to get 
Karl Pearson’s chi-square statistic xp*. The contribution to xp* from the 
two cells for the 7-th subgroup is, from (11), 








(np: — np)? [(n — npi) — (n — np)]? 
np n— np 
which may be simplified to 
n(pi — p)?* (pi — p)? 
& P) _ (pi — B)? (12) 
p(1 — p) é,? 
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It follows that the sum of the terms (12) is 
xp® = kS,/6,*. 


We see that the criterion xp* used in the chi-square test is, except for 
the constant factor k, exactly the statistic (10) to which we were led 
above. 


C-CHARTS AND CHI-SQUARE TESTS 


For the case of c-charts, we may imagine that the numbers in (1) 
are the numbers of defects, that is, z;: is the number of defects in the 
first article of the 7-th subgroup, 22 is the number of defects in the 
second article of the 7-th subgroup, etc. The subgroup average 2; is then 
the observed average number of defects per article in the 7-th subgroup, 
but instead of controlling this it is conventional to control the total 
number c; of defects observed in the 7-th subgroup. While carrying 
through the further discussion in terms of the %; would make for a 
more uniform treatment of the three charts considered, we shall never- 
theless proceed with the customary c;. Of course, since c;=nZj, it is 
really immaterial whether we use c; or Z;, just so that we remember } 
that averages, standard deviations, and control limits in the former 
case are n times those in the latter. 

If a state of statistical control existed, one would expect the c; to be 
governed by a Poisson probability distribution. Let us call the mean of 
the Poisson distribution c’; then its standard deviation is ~/c’. The 
population value c’ would be unknown; an unbiased estimate of the 
unknown c’ would be ¢=n7, the average of the c;, 


€ = ( +e, +--+ + x)/k. 
An estimate of the standard deviation of the c; would then be 
6. = Vé. 
The “three-sigma” control limits for the c-chart are 
é+ 36, 


where the lower limit, if negative, is again usually replaced by zero. 
Following a now familiar path, we would be interested in the extreme 
values of 

(cs — €)?/é.2 
for the control chart analysis, and in 


S./6.? 
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CONTROL CHARTS, VARIANCE AND 3? TESTS 
for a “general spread analysis,” where 

Se = [(cs — €)? + (c2 — 2)? + +++ + (cx — €)?)/k. 
The usual statistical test for whether the observed numbers c; may 
plausibly be assumed to come from a common Poisson distribution is a 


chi-square test with the k by 1 “observed” table shown in Fig. 2, 
and a corresponding “theoretical” table in which the entries in the k 
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FIGURE 2 


cells are all €. The contribution to the statistic xp? from the i-th cell is 
then, from (11), 
(c; —— €)*/é, 


and for the sum of these k terms we find 
xp? = kS./é-?. 


This is again, except for the constant factor k, the criterion (13) to 
which we are intuitively led in considering the general spread of the 
Cj. 


PURPOSE OF CONTROL CHART METHODS COMPARED WITH 
THAT OF THE OTHER METHODS 


Perhaps the difference in spirit between the treatments of the data 
by control charts on the one hand, and by analysis of variance or chi- 
square tests on the other hand, might be put picturesquely as follows: 
In the first case we ask whether subgroups giving rise to extreme values 
of the statistic (Z, p, or c) being controlled may be convicted of being 
criminals, and if so, we try to find and remove the causes of their aber- 
rations. In the second case we ask whether there is evidence of criminal- 
ity in the group as a whole, with less immediate interest in pinning the 
crimes on a particular subgroup. 





PROBLEMS IN PROVIDING ADEQUATE STATISTICS 
ON BUSINESS PROFITS* 


Susan S. Burr 
Board of Governors of the Federal Reserve System 


ow will business profits be affected by changes in business activity 
H and prices? What effects will changes in profits have on business 
capital expenditures and private entrepreneurial incentives? Answers 
to such questions are involved, either explicitly or implicitly, in nu- 
merous decisions made at different stages of the business cycle. Answers 
to such questions have been especially important in the transition 
period; for example, it was evident that early repeal of the excess 
profits tax would leave unusual profits, reflecting wartime maladjust- 
ments, available in the hands of some groups in the economy. Who 
would or should get the benefit? The entrepreneur, through larger 
profits; the wage earner, through higher wages; or the consumer 
through a reduction in prices of product or services; or should all three 
share? Our ability to answer such questions or to determine what is 
happening currently depends to a considerable extent on the statistical 
data on business profits. 

The needs, in general terms, are for better over-all figures on the 
level of business profits as a share of the national income; for more 
comprehensive and better stratified figures on the relative profitability 
of business in individual lines of activity in order to interpret current 
changes in volume of activity and prices in terms of their effect on 
profits. Last, but not least, we need to have profits data for recent 
periods available much more promptly. 

The problems of securing adequate statistics on business profits 
can be best considered under three main headings (1) adequacy of 
coverage of the business universe; (2) adequacy of items and their 
classifications for major analytical purposes; and (3) degree of prompt- 
ness with which data become available. 


PRINCIPAL SERIES OF DATA NOW AVAILABLE 


As a preface for considering these problems I want to review briefly 
three major types of current data on business profits. 

(1) The principal detailed series of profits data regularly available 
are compiled from income tax returns and published in the Statistics 


* Paper delivered before the American Statistical Association, Atlantic City, New Jersey, January 
24, 1947. 
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of Income. These “bench mark” data include considerable detail on 
corporate profits, limited details on the business profits of individual 
sole proprietors who are reached by the income tax, and special tabu- 
lations prepared from the informational returns filed by partnerships. 
The most recent Statistics of Income data available cover 1943. 

(2) Estimates of annual and quarterly business profits are made 
by the Department of Commerce as a part of its national income 
estimates. The figures cover corporate profits after taxes and also 
income (profits and salary) of unincorporated proprietors. These 
estimates are now available for the year 1946, but no quarterly data 
have yet been released. The Department of Commerce also projects 
some Statistics of Income series on profits by industrial groups to fill 
the time gap in order to anticipate Statistics of Income. These data are 
now available through 1945. 

(3) Current data of a fragmentary nature are regularly available, 
based largely on quarterly income statements of individual companies 
reported in financial publications. The two most generally used sets of 
data are those that appear at quarterly intervals in the monthly letter 
of the National City Bank and in the Federal Reserve Bulletin. The 
National City Bank tabulation includes profits after taxes and ratio 
of profits to net worth for about 15 subgroups of manufacturing and 
nonmanufacturing activity. The Federal Reserve Bulletin shows quar- 
terly profits data of large industrial, railroad, and public utility cor- 
porations. In the latter publication figures on profits after taxes are 
available for 10 subgroups of manufacturing and for a miscellaneous 
group of trade and service companies, while profits before and after 
taxes are available for Class I railroads, electric utility companies, 
and telephone companies. For the latter three groups figures are secured 
through Government regulatory agencies in these fields. 

I shall not comment on available profits data for special areas or 
special years, which are very helpful but which do not contribute as 
directly to data for current analysis. These include regular and special 
tabulations by the Securities and Exchange Commission from income 
statements of registered companies; specia] industry studies of the 
Federal Trade Commission; special compilations of profits during the 
war years made by the Office o: Price Administration from its cor- 
porate reporting service; and special tabulations prepared jointly by 
the Federal Reserve Board and the Robert Morris Associates as a part 
of the study of financial developments among manufacturing and 
trade concerns by size groups. There is also at the Bureau of Internal 
Revenue a source book underlying Statistics of Income tabulations in 
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which more detailed data are available to accredited organizations 
upon request. 


PROBLEM OF COVERING THE BUSINESS UNIVERSE 


The major problem of covering the business universe is that of secur- 
ing better statistics on unincorporated business. 

Profits data for individual sole proprietors and partnerships are ex- 
tremely limited and are largely from Statistics of Income. Figures on 
net business profit or loss are included in the distributions of total 
individual income by sources. In recent years specific types of income, 
such as business profits, have also been classified by size, for individuals 
with incomes over $5,000. Available less regularly are receipts and 
expenditures from the business schedule in the individual return, again 
for larger incomes, classified by industry. The special tabulation from 
partnership returns for 1939 is somewhat similar to that from the 
individual business schedule except that more classifications of data 
are provided. In both cases the profits figures include any salary of 
the individual from the business, and he usually participates in its 
operations. While this concept has advantages from the point of 
view of administering income tax returns, it limits considerably the 
value of the data from the point of view of a profits concept. Sales and 
balance sheet data are almost nonexistent for unincorporated business. 
The data on individual proprietors reach only those required to file an 
income tax return, and the coverage was far from complete prior to 
the low personal exemptions of the war period. 

The tabulation of receipts and expenditures data from partnership 
returns for 1939 means that for that one year the Statistics of Income 
provides more complete data on business profits than in any other 
year. The gap in covering the business universe consists largely of those 
individual proprietors who are not subject to the income tax—a large 
group in 1939. A similar tabulation for 1945, now in process, will prob- 
ably cover most of the business universe because of the greatly reduced 
exemption, $500; for reporting purposes this means $500 gross from 
all sources. Of the 6-63 million business schedules filed, half repre- 
sented farmers. 

Because of the difficulties of securing satisfactory figures on profits 
of unincorporated business, there has been a tendency to assume 
that the more complete statistics on corporate enterprise are represen- 
tative of the unincorporated sector. This assumption has probably 
been fairly reasonable for some over-all measures such as the share 
of national income. 
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There are several important reasons now for attempting to fill this 
gap. First, it is clear that there is a growing need for more reliable data 
on the fluctuations of profits of small businesses during various phases 
of the business cycle, on the relative profitability of small versus large 
businesses, and on the extent to which business concerns operate at a 
loss. Second, estimates of the net income of nonagricultural proprietors 
would be greatly improved by better basic data on unincorporated 
business. 

The present profits data would be greatly improved (1) by some 
segregation of salary of the entrepreneurs from their profits; (2) by 
more detail on costs of operations; and (3) by data for measuring 
profits in relation to net worth. Some changes in this direction can be 
made through tabulations from tax returns, but results may be costly 
and slow. The whole area of unincorporated business might be ap- 
proached through the collection of adequate income and balance sheet 
data from a sample of unincorporated business selected on a scientific 
basis. The Bureau of Internal Revenue is now experimenting with a 
sampling of 1945 returns, both corporations and individual businesses, 
to secure detailed receipts and expenditures of the smaller units. 

A question of major importance is the handling of a figure that repre- 
sents the combined profits and earnings of the entrepreneurs. Rather 
than segregate the two items, which probably presents insuperable 
difficulties, it may be necessary to adjust the analysis of small- 
business profits to the combined figure. 


PROBLEM OF SECURING MORE SIGNIFICANT DATA FOR ANALYZING 
PROFITS 


The remaining comments will be directed toward corporate profits 
since this is the major area and the one for which we now have the 
most complete data and in which improvement can be expected more 
promptly. 

As indicated earlier, the “bench mark” data come from tax returns, 
compiled by the Bureau of Internal Revenue. Briefly, a simplified in- 
come statement is available, by major industries and by subgroups, by 
asset size groups, by corporations with net incomes and deficits (in the 
sense of income subject to tax). Aggregate profits data in these classi- 
fications can also be related to major balance sheet items. 

The problem of securing statistics can be indicated in terms of the 
problems of using these figures. The Bureau of Internal Revenue has 
for many years attempted to improve these data for general analytical 
purposes. To the extent that it has failed in meeting its objectives, this 
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must be viewed as a part of the problem of administering an income 
tax law, the Bureau’s primary responsibility. The tabulation of statis- 
tical data for general purposes cannot come first when the two objec- 
tives conflict. 

The difficulties posed by Statistics of Income data may be sum- 
marized briefly as follows: 

First, statistics secured under statutory definitions of income for tax 
purposes do not comprise in all respects profits for economic analysis. 
The Department of Commerce, in adjusting the Statistics of Income 
figures on compiled net profit for the years 1929-43 to secure profits 
according to the national income concept, makes nine adjustments, 
some of which involve more than one set of figures. The amounts for 
several items are substantial, and their distribution by industrial 
groups and asset sizes varies. All of these adjustments, however, would 
not be essential for many uses of the data. 

The major adjustments are: (1) inclusion of dividends on a net 
basis, that is, dividends received less dividends paid; (2) the addition 
of estimated additional income uncovered by audit of the tax returns; 
(3) exclusion of capital gains and losses; and (4) exclusion of charges 
to reserves for depletion of resources.! A somewhat different set of ad- 
justments is involved in deriving data intended primarily for study 
of the earnings position of business enterprises. A special problem that 
arises there is the appropriate adjustment of changes in value of inven- 
tories. 

The second difficulty arises from the fact that in the Statistics of 
Income details are either unavailable or incomplete for some major 
items important in the analysis of changes in profits. There are no 
details on cost of goods sold, such as materials purchased, wages and 
salaries, changes in inventories, and other costs. Lack of income state- 
ment data on labor costs has been felt very keenly in recent years. 

Finally, the “bench mark” data by industry and size are not dis- 
tributed by relative measures of profitability, such as ratio of profits to 
sales and ratio of profits to net worth. 


PROBLEM OF SECURING DATA ON CURRENT TRENDS IN PROFITS 


One of the most serious limitations of our present profits data, as 
in the entire field of business financial statistics, is the slowness with 
which current data become available. The events resulting in a par- 

1 Other adjustments include exclusion of mutual savings banks; inclusion of Federal Reserve Banks; 


adjustments in war years for accelerated amortization and also for reduction in profits through renego- 
tiation; and the inclusion of State income taxes in the tax figures. 
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ticular change are long since past, or even forgotten, by the time de- 
tailed and reliable figures are available that wovld help to illuminate 
the various factors at work. In an early post-war period of major 
readjustments, such as we are now experiencing, it is unfortunate that 
so much guess work is necessary on current profit trends in different 
manufacturing and nonmanufacturing activities, and that there is 
even more guess work as to how changes in activity, in prices of com- 
modities, and in wage rates are affecting profits. 

The “bench mark” data from Statistics of Income are of little help 
in the present situation, because the latest statistics are for 1943. 
Estimated aggregates for major profit items that have been prepared 
by the Department of Commerce are being used to fill the gap for 
1944-45, and that Department has performed a fine service in making 
such estimates available in some detail by industrial groups. Most 
users of the data will not be greatly surprised, however, if the final 
results for these years differ considerably from the estimates. 

“Bench mark” data, moreover, are available only on an annual 
basis. No public or private agency compiles an adequate series of 
profits for a shorter interval. The data, available for relatively few 
corporations, are on a quarterly basis. These data cover chiefly the 
larger companies and the item “profits after taxes.” In recent years, 
when tax liabilities have been changing frequently, there has been need 
for quarterly data on profits before taxes and still more recently for 
quarterly sales data and, if possible, operating costs. There should also 
be better coverage, quarterly, for certain types of nondurable manufac- 
turing, and for trade and service industries, and also for medium-size 
and small companies in all lines, roughly those with total assets under 
five million dollars. 


USE OF SAMPLE TO SECURE PROFITS DATA MORE PROMPTLY 


The needs for data more promptly for current analyses would be 
adequately covered if we had available soon after the close of each 
year income and balance statement items for a well selected sample of 
corporations, and if the annual statements were supplemented by 
quarterly data, also available promptly, from a well selected, though 
even smaller, sample. 

The major needs may be summarized thus: 

(1) Data available for use more promptly after the close of the 


2 The Securities and Exchange Commission in 1946 instituted a quarterly series of data on sales or 
operating revenue for a large group of corporations, but corresponding net profit data are not pro- 
vided. 
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accounting period: by six months after the annual period and 60 days 
after the close of the quarterly period. 

(2) Major items related to profits: sales, details of operating costs, 
operating profits, taxes, profits after taxes, dividends, and retained 
earnings. These should be classified in some detail by industrial activity, 
by size of concern, and by measures of profitability. 

Efforts should be made especially to improve the coverage in non- 
durable manufacturing, in trade, in service, and also in the small 
business area. 

(3) As a final point I would stress the need for other income and 
balance statement data along with the profits data. All business finan- 
cial data are interrelated, and an adequate interpretation of one item 
requires an understanding of its relationship to other items. This is 
particularly true of profits, which is a residual element whose level 
and fluctuations are dependent primarily on the level and fluctuations 
of other income account items. Moreover, the interpretation of any 
business income, or flow, account is strengthened by relating it to the 
balance sheet, or financial position, of the firm at the time of the par- 
ticular income flow. 

The needs for better profits data are generally recognized, and a 
well-organized attempt is being made in Washington to solve the ad- 
ministrative and other problems of meeting the needs. Some parts of 
the program have been recently announced, and I understand that it 
has been fully described at another session of these meetings. Briefly, 
annual data will be provided more promptly by the Bureau of Internal 
Revenue by sampling corporate returns as they come in for statistical 
tabulation. The Securities and Exchange Commission and the Federal 
Trade Commission plan to cooperate in securing selected quarterly 
income and balance sheet data covering samples of both large and 
small corporations in various lines of activity. 

It is too early to guess how successful the efforts will be; what data 
will be available and when. The appropriation of necessary funds by 
Congress is one uncertainty in the situation. Many of the organiza- 
tional and procedural obstacles that defeated similar efforts in prewar 
years appear no longer to exist. 
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SAMPLING FOR THE 1947 SURVEY OF 
CONSUMER FINANCES* 


Roe GoopMAN 
Survey Research Center, University of Michigan 


The designing of a sample for financial surveys requires spe- 
cial attention to families with higher incomes in order that ade- 
quate information may be obtained about what the “dollars 
will do” as well as what the heads of households will do. The 
chief problem posed here is the adaptation of area sampling 
to a procedure that is more efficient than the more usual 
stratified random sampling would be. The modifications in- 
troduced were essentially devices for disproportionate sam- 
pling of households classified by average rent values and esti- 
mated income levels. In determining sample size it must be 
recognized that the sampling errors of different items of in- 
formation vary considerably from item to item. The findings 
from the 1947 Survey of Consumer Finances have been pub- 
lished in the June, July and August issues of the Federal Re- 
serve Bulletin. 

The methods and procedures illustrate the application of 
area sampling to a practical survey problem. The sample con- 
sisted of small clusters of households within cities, towns and 
in the open country of 66 primary sampling areas. 


HE 1947 Survey of Consumer Finances had as its objectives an 
y oon of the current savings of the people of the United States 
and their plans, attitudes, and expectations regarding savings and 
spendings. It was the second survey of its kind conducted on a nation- 
wide basis for the Federal Reserve Board.! Specifically, the survey was 
designed to secure information on such topics as the following: (1) 
amounts of liquid assets held, their distribution among various income, 
occupational and other groups, and changes in these distributions since 
a year ago; (2) amounts of savings in 1946, proportions of incomes 
saved, factors associated with rate of savings, and comparison of the 
1946 rate with that of the previous year; (3) peoples’ plans for using 
their liquid assets, amounts to be spent in the near future, kinds of 
goods to be purchased, expected price levels and trends, and effect of 
these expectations on economic actions. 
In view of the above objectives, it is evident that the sample for such 
a survey must be designed to secure information about the different 
* A paper delivered at the 106th Annual Meeting of the American Statistical Association at At- 
lantic City in January, 1947. See also Katona, this issue. 


1 Campbell, Angus and Katona, George, “A National Survey of Wartime Savings,” Public Opinion 
Quarterly, 10: 373-381, 1946. 
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income groups and about the groups holding varying quantities of 
readily available funds. To meet the needs of the survey the sample 
must be designed to provide more than a cross-section of the popula- 
tion; it must give adequate representation to various segments of the 
population, particularly the segment which holds appreciable quantities 
of liquid assets. It must be a representative sample of households, bui 
in a sense it must also be a sample of dollars. It is important to find out 
what the heads of households are planning to do, and it is also im- 
portant to determine what the “dollars will do.” Since an interview 
with a single respondent of high income may provide one answer which 
applies to many dollars (the quantity finally estimated being the prod- 
uct of the number in this group and the average number of dollars 
reported) it is necessary to give special consideration to the households 
in the higher income groups if reliable averages and totals are to be 
obtained about the dollars. 

While the particular objectives of this survey lead to certain varia- 
tions in the method of choosing the sample, the basic design is the one 
customarily used in government enumerative surveys. That is, a set of 
counties and metropolitan areas are chosen as primary sampling units, 
and within each of these counties and metropolitan areas a subsample 
of dwelling units is selected for interviewing. As will be seen, the prob- 
lems of the Consumer Finances Survey require particular attention to 
the sampling within the primary sampling units. 

For a survey which has a relatively small sample, travel and adminis- 
trative costs provide definite limitations on the number of primary 
sampling units that can be included. Obviously some interviews should 
be taken in each of the largest metropolitan areas, but beyond that, a 
selection of primary sample areas must be made among the remaining 
cities and counties of the country. For the Consumer Finances Survey 
each of the 12 largest metropolitan areas are taken as single primary 
sampling units. Below this point, the counties of the United States are 
grouped into 54 strata, from each of which one county is selected at 
random as the primary sampling unit to represent the stratum. 


METHOD OF STRATIFICATION 


The grouping of the counties into 54 strata was done by a mechanical 
process on the basis of certain variables for which information was on 
file. The process consisted first of dividing the counties into three 
groups, then 9, 27, and finally 54 groups. The groupings were always 
such that each group contained approximately the same total of 1940 
adult population. The variables which were used for this stratification 
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were: per cent of 1940 population living in urban places; average per 
capita bond sales in 1943; per cent industrialization as indicated by 
1940 per cent of total employed working in manufacturing industries; 
per cent 1940 population native white; and average size of farm accord- 
ing to the 1940 Census of Agriculture. 


SELECTION OF SAMPLE COUNTIES 


Within the 54 strata the selection of sample counties was at random, 
but instead of giving the counties in a stratum equal chances of selec- 
tion, each county was given a probability of selection proportionate to 
its adult population in 1940. This technique, reported by Hansen and 
Hurwitz,’ is designed for the taking of a constant number of interviews 
in each primary sampling unit. Hansen and Hurwitz reported substan- 
tial gains in sampling efficiency from the use of this method, as com- 
pared with the selection of counties with equal probability, in that 
there was a decrease in the sampling variance and in addition there was 
the elimination of the bias associated with the method of estimation 
commonly used. The selection of a sampling unit with probability 
proportionate to its “size” is accomplished by securing a cumulative 
total of the measures of size and then drawing at random, each sampling 
unit being assigned as many numbers (chances of selection) as its size 
requires. 


THE PRIMARY SAMPLING UNITS-—-SUMMING UP 


From the foregoing, it is seen that the Consumer Finances Survey in- 
cluded a total of 66 primary sampling units of which 12 are metro- 
politan areas, each consisting of all counties in the area, and 54 are 
single counties representing strata of varying numbers of counties but 
of virtually constant size in terms of adult population. Now in giving 
the details of the method of stratifying counties it was not intended to 
convey the impression that the method used was necessarily superior 
to other methods that might have been used. The conclusion on this 
point is that the method described is a convenient one which embodies 
the principle of trying to make each stratum as homogeneous as possi- 
ble with respect to the types of information sought in the survey. 
Moreover, recent findings by the Bureau of Census have indicated that 
for population inquiries, after a few factors have been used, a great 
deal of added work in the stratification of primary sampling units does 
not add appreciably to the gains from stratification. 


? Hansen, Morris H. and Hurwitz, William N., “On the theory of sampling from finite populations,” 
Annals of Mathematical Statistics, 14: 333-362, 1943. 
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SAMPLING WITHIN THE PRIMARY SAMPLING UNITS—-GENERAL 


We turn now to the question of selecting the actual dwelling units 
within the counties and urban areas which have been chosen as primary 
sampling units. As can be seen from the diagram which follows, the 
sample finally consists of clusters of dwelling units located within sam- 
ple areas. In all places except the open country, the cluster consists of 
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a sub-sample within the sample block within the sample town, which 
in turn is located within the primary sampling unit (x and y in block 
c-1 is an example of such a cluster). In the open country the cluster in- 
cludes all dwelling units in a sample area, called a segment, which has 
been selected at random from the open country segments of the primary 
sampling unit (the dwelling units in segment 1 under D for example). 
The open country portions of each primary sampling unit are as de- 
fined by the Master Sample of Agriculture, the development and de- 
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sign of which have been described by King and Jessen. In the general 
design depicted below, all selections of clusters, blocks, towns, and open 
country segments are made by random processes, a fact which assures 
that each dwelling unit within the primary sampling unit has a known 
chance of coming into the sample. 


SPECIAL PROBLEMS OF CONSUMER FINANCES SURVEY AS 
RELATED TO THE “WITHIN” SAMPLING 


In deciding on the procedure for choosing dwellings within a primary 
sampling unit, it was necessary to take into account some of the con- 
ditions peculiar to the study of liquid assets, as mentioned earlier. 
Since it is known that the larger holders of liquid assets comprise but a 
small proportion of the population, steps had to be taken to secure 
adequate representation of this group. This was particularly impor- 
tant because, while representatives of this group are to be found in 
nearly every community, their activities are diverse and the amounts 
they hold in different forms are subject to extreme variation. The 
chance inclusion or exclusion of a few wealthy people in the sample can 
affeet greatly the averages computed from the survey results. Conse- 
quently it was necessary to strengthen the sample in the upper income 
groups to increase the likelihood that different groups of wealth and 
income would affect the results to the proper degree. 

In statistical terms in planning such a survey it is necessary to con- 
sider the magnitudes of the sampling errors and the major components 
of these errors. In order to show specifically the type of estimate that is 
being considered here, and the components of its variance, the formulae 
applicable to the basic design illustrated above are given as follows: 


- 1 Q 1 » X ijk 
xX’=—> > 
Q i=l j=l Nij 


where 
X' is an estimate of the population mean of some characteristic 


3 King, A. J. and Jessen, R. J., “The Master Sample of Agriculture,” Journal of the American Sta- 
tistical Association, 40: 38-56, 1945. 

4 Since the purpose of these formulae is to depict the two major components of the sampling error, 
the estimate and the resulting variance are given in somewhat simplified form and are applicable only 
to the 54 strata. In practice, the estimate used, and consequently its precise variance, are of a somewhat 
different form but the formulae given here are satisfactory approximations in view of the comparative 
uniformity in the sizes of the strata. The fact that selection of counties was made with probability 
proportionate to numbers of adults in 1940 rather than 1947 numbers of spending units also would 
properly increase the complexity of the variance function, but this is disregarded here. 
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X ii is the value of the characteristic for the kth spending unit in 
the jth county in the 7th stratum 
n;; is the number of spending units in the sample in the jth 
county in the 7th stratum 
and Q is the number of strata 








2 1< — Ni T\2 
em Fly —y~ Xe 
Q? i=1 j=l 1Vi F 
( Nij ) 
S. . (X six X ij)? | 
1 SJ Ni33 k=l 4 
+—2, —| ; ~~ 
Ci j=l N; ( N ij 
where 


X;; is the average for all spending units in the jth county in the 
ith stratum 

X; is the average for all spending units in the 7th stratum 

N;; is the total number of spending units in the jth county in the 
ith stratum 

N; is the total number of spending units in the 7th stratum 

is the number of counties in the ith stratum 

and ¢; is the “effective size of sample within the county (the actual 

size of sample if spending units were selected at random 

throughout the county) 


In the variance formula, it is the within component (the last term in 
the equation) which may be reduced by an efficient method of sampling 
within the primary sampling unit. (For the metropolitan areas the 
within component is the whole error since the primary sampling unit 
is the entire stratum.) Now to the extent the effective size of sample 
c; (as shown in the formula) can be increased by stratification or other 
sampling method, the within component of the sampling variance 
(which contains much of the variability resulting from differences in 
income levels) will be reduced and the over-all efficiency of the sample 
design will be improved. Because it has been recognized that the within 
component of the error can be very serious for important types of ques- 
tions asked in the Consumer Finances Survey, the major effort in de- 
signing the sample was directed toward improving the effectiveness of 
this phase of the sampling. 


SAMPLING WITHIN THE PRIMARY SAMPLING UNITS IN 1946 


In the survey a year ago the general design described above was used, 
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with very little modification, in all primary sampling units except 
within 26 major cities. Within these cities, however, together compris- 
ing or representing about 40% of the population of the United States, 
an effort was made to reduce the within component of the sampling 
error by means of a stratification of blocks on the basis of 1940 Census 
average rental values. In allocating the sample among the strata Ney- 
man’s well known principle of optimum allocation indicated that since 
the replies from residents of high rent blocks could be expected to be 
more variable than in other blocks, the high rent blocks should be 
sampled at a higher rate. As a result the blocks in the higher rent strata 
in the 26 cities were sampled at from 3 times up to 5 times the regular 
sampling rate for these citeis. In the tabulations of the results, this 
“loading” of high rent blocks was, of course, offset by the use of ap- 
propriate weights. 

The stratification of the blocks into rent groups and the selection of 
sample blocks was done by the technique known as double sampling. 
As introduced by Neyman,' the method involves the selection of a large 
sample for the purpose of classifying and estimating strata totals and 
selection of a subsample within each stratum for the direct purposes of 
the survey. Census block statistics which show dwelling units and aver- 
age rentals per block for cities of over 50,000 population were used for 
the preliminary (large) sample. The use of the large sample for esti- 
mating strata totals rather than the classifying of all blocks was advan- 

ageous because of the saving in clerical expense. The stratification of 

blocks produced a gain in sampling efficiency because it permitted the 
use of different sampling rates within the different strata, whereas the 
use of double sampling merely for proportionate sampling within the 
strata would probably have resulted in virtually no gain. 

The final sample for interviewing consisted of an expected number of 
two dwelling units in each sample block; in the open country, in order 
to reduce travel costs, a somewhat larger expected number of dwelling 
units was taken, generally averaging from 2 to 7 per segment. Within 
each dwelling unit in the sample, interviews were to be taken at every 
spending unit. The total number of dwelling units selected was such as 
to yield an expected 3,000 interviews after allowing for refusals or for 
dwellings where no one could be found at home. The interviewer was 
allowed no discretion in the choice of dwellings and substitution for the 
addresses given him was not permitted. 


5 Neyman, J., “Contribution to the theory of sampling human populations,” Journal of the Ameri- 
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FINDINGS FROM 1946 DATA AS A BASIS FOR IMPROVEMENT OF SAMPLE DESIGN 


As a preliminary step in the designing of the 1947 sample, a check 
was made on the 1946 sample data on liquid asset holdings in order to 
secure information on possible improvements in the “within” sampling 
procedure. It had been assumed, for example, that the average holdings 
of residents of high rent blocks would be greater than those of residents 
of the remaining blocks, but it was not known how much greater and to 
what extent holdings in the high rent blocks would be uniformly greater 
than those in other blocks. The analysis showed that rental value was a 
useful means of stratification in that there was a much higher percent- 
age of holdings totaling $5,000 or more in the high rent blocks than 
elsewhere, but even in these blocks the number of households having 
such holdings was generally less than 20% of all households. It showed 
also that in the larger cities, households in the lower rent blocks rarely 
held as much as $5,000 in liquid assets and that segregation of these 
blocks in the sampling procedure was therefore worthwhile. Another 
important finding was that in the smaller towns (under 50,000 popula- 
tion) and villages, the holding of $5,000 or more in liquid assets was 
nearly as frequent in occurrence as in the larger places and that since 
35 to 40 per cent of the households of the United States are to be found 
in such places an improved method of sampling within these towns was 
highly desirable. Finally, the average holding of liquid assets per dwell- 
ing unit in the high income blocks, while several times that in the re- 
maining blocks, was in general not large enough to require sampling 
them at as high a rate as 5 times the regular rate. This last finding is 
based on the assumption so frequently borne out by experience that, as 
the means in a series of averages increase, the standard deviations of 
the respective sets of data increase also, but at a slower rate. 


MODIFICATIONS IN WITHIN SAMPLING FOR 1947 


In the 1947 survey the selection of cities, towns, blocks and open 
country segments followed the pattern used for the 1946 survey. 
Within the sample blocks a refinement was made in the over-sampling 
of high-income dwellings by another double sampling procedure. Here 
the “large” sample consisted of clusters of dwelling units selected at 
from 2 to 4 times the regular sampling rate. The interviewer rated each 
dwelling high, medium, or low in respect to probable income level, on 
the basis of general appearance of the property but without contacting 
the occupant. The “high,” “medium,” and “low” groups of dwellings 
were thereafter treated as separate strata within the primary sampling 
units. In the small sample all the dwellings rated “high” were retained 
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for interviewing, but the dwellings rated “low” and in some cases those 
rated “medium” were subsampled and only part of them were included. 
Here, as in the 1946 survey, the double sampling was introduced in 
order to permit disproportionate sampling of the various income groups 
and it is the disproportionate sampling rather than the mere classifica- 
tion by income levels that provides the gain to be derived by double 
sampling in this instance. The blocks of the type classified as low rent 
blocks for the 1946 survey were not over sampled at all: hence no rating 
of the dwelling units or subsampling within these blocks was necessary. 

In order to assure that the subsampling should be done without bias, 
instruction on the random start to be used for the subsampling was 
withheld from the interviewer until the listing by income groups was 
completed. The device used for this was a seal which was placed over 
the number showing the random start and which the interviewer re- 
moved only after listing of the income groupings was completed. In 
connection with this procedure, it may be observed that differences 
between interviewers in the classification of dwelling units into income 
groups does not introduce a bias in the survey results. 

The modifications in sampling procedure for the 1947 survey were 
introduced after consultation with Mr. Morris Hansen on the specific 
problems of this survey. The details of the 1946 sample were developed 
under the direction of Mr. Earl Houseman. 


SIZE OF SAMPLE AND SAMPLING RELIABILITY 


The procedures of stratifying blocks by rent groups and, super- 
imposed on this, of stratifying dwelling units by probable income levels 
were introduced in order that the accuracy of a sample of 3,000 spend- 
ing units might be as great as possible within a reasonable cost. The 
question may be raised, however, as to why the size of sample was 3,000 
rather than, say 2,000 or 4,000. The answer is that consideration of the 
sampling errors of important items of information to be covered by the 
survey indicated that a sample as large as this was needed, but that for 
national estimates of the type contemplated, a sample of size 3,000 
would satisfy minimum needs and yet remain within budgetary limita- 
tions. 

The important point to be kept in mind in determining sample size 
is that the sampling errors for different items of information may vary 
sharply. For example, if about 65% of the households have war bonds 
and 35% do not, the sampling error for estimates of these percentages 
from the present survey will be something over one percentage point. 
This means that except for errors in interviewing or other biases the 
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probability is 95% that the estimate of those having bonds will be within 
about 2} percentage points of the true proportion. (It is, of course, 
quite likely that the actual error from sampling would be much smaller 
than this.) In contrast to an estimate of this type, an estimate of the 
amount of demand deposits in banks for those with incomes of $10,000 
or more a year would be subject to an extremely large error. In between 
these extremes are many items having relatively small sampling errors 
and many others having moderate or large sampling errors. The sample 
size, therefore, has to be large enough to assure a minimum degree of 
reliability for the most variable item for which information is required. 
Conversely, once the sample is drawn and the interviews are taken the 
estimates that may be made and the conclusions that may be reached 
are definitely restricted to those for which the size of sample is ade- 
quate. For example, for the Consumer Finances Survey, it is not pos- 
sible to make estimates for local areas. Also, with the highly skewed 
distribution of demand deposits, the mean holding of deposits is sub- 
ject to a rather large sampling error. For many purposes, however, the 
median holding is a satisfactory measure and it has a much smaller 
sampling error when the distribution is highly skewed. 

In summary, the Consumer Finances Sample was designed to secure 
about 3,000 interviews distributed in such a way that those at the upper 
end of financial distributions should have an adequate representation. 
In carrying through this plan, the basic design was similar to that for 
other enumerative surveys up to the point of selecting dwelling units 
within the primary sampling units. Here the dwelling units finaliy to 
be included were selected by area sampling and use was made of strati- 
fication by means of double sampling in order to increase the represen- 
tation among the higher income groups within each primary sampling 
unit. The sample size was adequate for national estimates of most im- 
portant items, but not large enough to permit al] the types of classifica- 
tions that might be desired or for the making of regional estimates. 











































\TION 
rithin 
urse, 
1aller 
f the 
),000 
ween 
rrors 
mple 
ee of 
ired. 
1 the 
ched 
ade- 
pos- 
wed 
sub- 
the 
aller 


eure 
yper 
ion. 
for 
nits 
r to 
ati- 
en- 
ling 
im- 
ica- 






























































CONTRIBUTION OF PSYCHOLOGICAL DATA TO 
ECONOMIC ANALYSIS* 


GEORGE KaTONA 

U niversity of Michigan, Survey Research Center 
The recent use of sample interview surveys as a tool of 
economic research! is based on certain theoretical assumptions 
that concern the need for and the value of psychological 
analysis of economic behavior. An attempt will be made in 
this paper to outline the theory of what may be called “eco- 
nomic psychology.” Examples will be cited from research 
completed, or now under way, to illustrate the applications 

and the limitations of this research method. 


I. THEORY OF ECONOMIC PSYCHOLOGY 


Collection and extrapolation of aggregate data appears to be at 
present the most usual method of economic research and the usual tool 
of economic predictions. Data on the gross national product, national 
income, the expenditures and savings of all consumers in a given past 
period, or—in business economics—data on aggregate production, 
sales and profits of all businesses or of certain types of businesses may 
be mentioned as the outstanding examples of the instruments of cur- 
rent economic statistics. These are global data or, if divided by the 
number of people or firms, averages. 

In spite of their obvious great importance, these data need to be 
supplemented. The supplementation may take three directions: 

First, and this is of course widely recognized, in addition to aggregate 
or macro-economic data, micro-economic data are needed; that is, in- 
formation concerning individual units—families (or preferably spend- 
ing units) and firms. 

The reason for this requirement is partly a statistical one. An average 
unaccompanied by data about the distribution may be misleading. To 
illustrate, savings are computed by the Securities and Exchange Com- 
mission for each quarter by adding estimated increases in personal 

* Paper presented before the American Statistical Association, Atlantic City, January 25, 1947. 

1 The Survey Research Center of the University of Michigan conducted the Survey of Consumer 
Finances in 1947 and the Division of Program Surveys, U. S. Department of Agriculture, the Survey of 
Liquid Asset Holdings, Spending and Saving in 1946. The same personne! was in charge of both surveys. 
They were sponsored by the Board of Governors of the Federal Keserve System and benefited from the 
advice and participation of the Division of Research and Statistics of the Board. Extensive summaries 
and discussions of the surveys appeared in the following issues of the Federal Reserve Bulletin: June, 
July, and August 1946, and June, July, and August, 1947. From 1942 to 1946 the Division of Program 
Surveys was also engaged in a research program for the Savings Bonds Division of the U. 8. Treasury. 


References to the history of this research are contained in an article by A. Campbell and G. Katona, 
“A National Survey of Wartime Savings,” Public Opinion Quarterly, 1946, pp. 373 ff. 
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bank deposits, war bond holdings, mortgage payments, insurance 
premiums, and so forth. In the third quarter of 1946, for example, con- 
sumer savings were thus estimated to exceed 4 billion dollars, or 10 
per cent of the disposable income of all consumers. These savings may 
have originated in many different ways; it is possible, for instance, that 
each consumer saved 10 per cent of his income; or it may be that a 
small proportion of the consumers, those with large incomes, saved 40 
per cent, and the vast majority of people saved nothing. The global 
figures furnish no clue to what actually happened. To understand the 
average rate of saving, we must have information on its distribution. 
Similarly, a necessary adjunct to estimates of average or aggregate in- 
comes, liquid asset holdings, business sales, profits, etc., is information 
about their size distribution. 

Information is also needed on the distribution of changes in economic 
magnitudes. For instance, when the SEC computes that the average 
rate of saving declined from 15 to 10 per cent, the question arises 
whether such a decline is due to actions of a small minority or of a 
great majority of people. The statistical significance of the change, and 
its bearing on past developments, differ according to the frequency 
distribution of individual changes. 

Information concerning individual units is needed further for the 
sake of dynamic analysis. This has been clearly pointed out in at least 
two papers that have appeared during the last year. In his analysis of 
the history of the 1920’s, Joseph A. Schumpeter said at the Cleveland 
meeting of the American Economic Association: “If, in a given year, 
one industry makes 100 millions and another loses 100 millions, these 
two figures do not add up to zero or, to put it less paradoxically, the 
course of subsequent events generated by this situation is not the same 
as that which would follow if both had made zero profits. This is one of 
the reasons why theories that work with aggregates only are so mislead- 
ing.”? Arthur F. Burns wrote in his 1946 annual report as Director of 
the National Bureau of Economic Research as follows: “Although 
broad index numbers or aggregates give useful summaries, they tell 
nothing of the processes by which they are fashioned.” 

The “processes by which they are fashioned” include the processes 
by which consumers and business arrive at those economic decisions 
the end products of which add up to statistical aggregates. The analysis 
of the formation of these decisions—for example, the search for reasons 

2 “The Decade of the Twenties,” American Economic Review, XXXVI, Supplement, 1946, p. 5. 


8 Economic Research and the Keynesian Thinking of our Times, National Bureau of Economic 
Research, 1946, p. 22. 
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why people have reduced their rate of saving, or why business firms 
have increased their capital expenditures—must be conducted on the 
level of individual families and firms. Microscopic financial data are 
indispensable for this purpose, but they alone are not sufficient. 

Secondly, then, aggregate or average financial data referring to past 
periods must be supplemented by information of a forward-looking 
character. The Stockholm school introduced the distinction between 
ex-post and ex-ante data, and Keynes, Hicks, and many other econo- 
mists laid great stress on the significance of expectations or anticipa- 
tions for economic analysis. They emphasized that investment deci- 
sions are usually based on anticipated market conditions, and pointed 
out that even consumers’ propensity to spend may be influenced by 
their expectations. 

The introduction of the term expectation is of course not sufficient. 
We must measure expectations; that is, determine their direction, 
elasticity, and frequency distribution. We must, further, explain ex- 
pectations; that is, relate them to other factors that arouse them. 

The third, and perhaps most radical, supplementation of aggregate 
economic data concerns the search for and the measurement of factors 
underlying and determining the formation of decisions by individual 
units. Motives, attitudes, and also expectations, are variables that are 
intermediate between a given environmental situation and the resulting 
overt behavior. 

The stimuli eliciting economic actions are widely studied. Changes in 
income, in prices, or in demand, may serve as examples. Similarly, the 
responses, such as stability, increase, or decrease in the rate of saving, 
in production, or sales, are being studied. Information concerning 
economic stimuli and responses may, however, be insufficient, for in- 
dividuals and groups may react differently to the same stimuli—accord- 
ing to their past experience and according to their understanding of the 
environment, which depends on the organization of the stimuli, on 
people’s attitudes and motives. 

When decisions are made automatically or routinely, information on 
overt behavior may be sufficient. But we do not know a priori or in 
advance whether this is the case. Analysis of attitudes and motives is 
required to find out whether in a given situation it is permissible to 
neglect the intervening variables. Possibly, the situations in which 
intervening variables may not be neglected are the most interesting 
ones, marking turning points in business cycles or crucial stages of in- 
flation or deflation. 

The same argument may be put in the following terms: If the classi- 
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cal pattern of self-regulatory market economy prevailed in our times, 
then the study of intervening variables would be of lesser importance. 
With the advent of monopolistic competition, however, when business 
executives have a considerable range of subjective choice of policies, 
the situation is different. Or, to use the concepts introduced by E. G. 
Nourse‘: if automatic or authoritarian price making were typical for 
our economy, we could perhaps dispense with some of the study of 
attitudes and motives. With the prevalence of administered prices, 
however, the reasons for changing or not changing prices must become 
an additional source of information. 

The current emphasis on economic dynamics does not relieve us of 
the responsibility of studying intervening variables. Economic dy- 
namics has been conceived by econometricians as sequence analysis 
differing from static analysis by nothing but the fact that every quan- 
tity is dated, in other words, that in addition to the quantities their 
time lag is also considered. No doubt this represents a step forward. 
Yet the basic problem remains. Suppose we find correlations between 
dated magnitudes, for example, between dwelling occupancy rates in 
year one and interest rates and building rates in year two, or between 
the corn-hog ratio in year one and hog production in year two. Is it 
then assured that the relationships established for a past period will 
prevail in the future? Additional studies are needed to furnish clues in 
this respect. 

Since description and analysis of economic behavior is widely held to 
be one of the major tasks of economics, the current neglect of interven- 
ing variables, that is, of dynamic psychological research, may need 
some explanation. Some of the current economics may be called “eco- 
nomics without psychology” in the sense that, except in a few state- 
ments in introductions to textbooks, it disregards the major tenets of 
scientific psychology, namely, the principle of the flexibility and plia- 
bility of human behavior, and does not make use of empirical studies 
for the description and analysis of behavior.® 

Concerning the question of how that situation developed, it may 
suffice to point to two of its probable major causes. One is the reaction 
to the traditional nineteenth century psychology that has been widely 
used in economics, to the concepts of hedonism and of the rational 
economic man with perfect knowledge and foresight. The other cause 
appears to be the belief that psychological factors are indeterminate, 

‘ Price Making in a Democracy, Washington, 1944. 

5It hardly needs to be said that in addition to “economics without psychology” we also have 


“economics with psychology” both in the form of empirical studies and of theoretical analyses that 
make use of psychological insight into the behavior of businessmen, investors, speculators, or consumers. 
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not measurable, not quantifiable.’ These historically understandable 
opinions have lost their weight with the development of modern 
psychology. 

More important than the search for historical reasons is the question: 
By what means have some economists achieved the exclusion of 
psychology from economic analysis? How does it happen that psycho- 
logical analysis is considered not essential or not needed in some current 
economic studies? It appears to the author that psychology has been 
excluded from economics by two means. One is the reliance on a mech- 
anistic view of human behavior, and the other the reliance on the law 
of large numbers. 

If human beings were automatons, if there were a one-to-one cor- 
respondence between economic stimuli and resulting economic actions, 
the study of the human factor could be rightly omitted. Such a cor- 
respondence is assumed, for instance, when it is argued that the pro- 
pensity to consume is a function of income alone.’ Given the disposable 
income, and its distribution, it is assumed that consumer expenditures 
can be predicted, and the human factor disappears. Or if it is held that 
inflationary price increases are the results of certain changes in the 
money supply, then too the human factor, expectations, fears, etc., 
need not be studied. We can then say, as has been said often during the 
last few years, that money competes for goods, and we need not com- 
plicate the situation by saying that consumers and businessmen, 
because of such and such reasons, use their money to compete for goods. 
Or, to give a last example, if profit expectations followed past profits 
and nothing but past profits, the human factor could again be neg- 
lected. 

Some scholars seem to acknowledge that it is not realistic to treat 
human behavior as automatic. Individual people and individual firms 
act differently. Yet for the economy as a whole this does not matter. It 

* Cf. G. Haberler, Prosperity and Depression, League of Nations, 1937: “Reactions are conven- 
tionally called psychological because of their (in a sense) indeterminate character,” p. 134. 

7 Most modern model-builders argue on the basis of that assumption, as clearly stated for instance 


by J. L. Mosak: “The volume of consumer expenditure is a function of the disposable income in the 
hands of individuals” (Econometrica, XITI, 1945, p. 30); but Mosak, and others as well, carefully add 
that this assumption represents an extreme cversimplificatica (ibid., p.33). Yet, with the exception of a 
few economists who discuss the possibility of influencing the propensity to save, it is generally held that 
if incomes are stable, expenditures and savings are also stable. This is implicit, for example, in one of 
the basic theses of Keynes: “The behavior of the public is, in general of such a character that they are 
only willing to widen (or narrow) the gap between their income and their consumption if their income is 
being increased (or diminished).” (The General Theory of Employment, Interest and Money, p. 248; 
italics mine; in a great many other respects, of course, Keynes does make use of psychology). Schum- 
peter then summarizes that the Keynesian consumption function “posits the existence of a unique 
relation . . . between consumption and income alone” (Review of Economic Statistics, XXVIII, 1946, p. 
195). A different view is outlined in a note by G. Katona and R. Likert in the same issue of the Review 


(pp. 197-199). 
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is argued that there is an inertia of large numbers; the individual differ- 
ences cancel out. It is widely held that what one person will do is un- 
certain, but what thousands of persons will do is not equally uncertain. 
The thesis is that past experience enables us to make reliable state- 
ments about the probable actions of thousands of individuals but not 
about the probable actions of a few individuais. This application to 
economic analysis of the principle used in actuarial tables is correct 
only if two conditions are fulfilled: namely, (1) if the decisions and ac- 
tions of the thousands of persons are not due to the same causes, and 
(2) if nothing is known or can be found out about the factors influencing 
the decisions or actions of a few individuals. 

Possibly there are economic situations in which the law of large 
numbers does apply because only random factors prevail which cancel 
out. But possibly sometimes the dice are loaded; atmospheres or 
climates of opinion influence many people at the same time in the same 
direction so that the deviations add up.* Only empirical research can 
determine which of these situations prevails at a given time. 

Doubts concerning the validity of the second proposition constitute 
the basis of survey methods. They originated in the belief that if in- 
dividuals know the factors that determine their future spending or 
saving performance? and are willing to give information on these mat- 
ters, then it would be possible to obtain reliable data about a few in- 
dividuals. If, furthermore, a representative cross-section of people 
could be sampled, then perhaps we would not have to rely solely on the 
law of large numbers to assess forthcoming developments of the entire 
economy. 

II. APPLICATIONS OF ECONOMIC PSYCHOLOGY 


How can economic psychology proceed? What are its tools? Like 
psychology, the study of behavior, economic psychology, the study of 
economic behavior, must be an empirical discipline. Theory has an 
important function in empirical science because fact finding must be 
based on hypotheses in order to lead to valid generalizations and 
ultimately yield verified or improved theories. But the tools of econo- 
nomic psychology are empirical. There are, so it seems, three methods 
at its disposal. 

One is the application of psychological principles and psychological 
findings to economic behavior. Findings made in empirical studies of 


8 The analysis of such situations is usually considered a topic of social psychology. In this article no 
use is made of the distinction between individual and social psychology. 

* It is not necessary to assume that all future expenditures are planned in advance. It suffices for 
our argument if certain major expenditures were dependent upon external circumstances and upon 
subjective motives, incentives, and attitudes known to the consumers. (Cf. also J. R. Hicks, Value and 
Capital, Oxford, 1939, p. 228, and R. A. Young and D. McC. Holthausen, Federal Reserve Bulletin, 
March, 1947.) 
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noneconomic behavior are often applicable to economic behavior. For 
instance, in studying motivation—in anaJyzing such problems as 
multiple motivation, hidden motives, rationalizations—economic 
psychology need not start from scratch but can make use of findings in 
other fields of behavior. It may also be mentioned that in a recent 
article the author tried to apply findings on the origin of expectations 
derived from experimental studies of the learning process, to economic 
behavior.?® 

A second method of economic psychology is the case study. Case 
studies may be undertaken, for instance, for the purpose of analyzing 
decision formation by individual firms. To be sure, case studies do not 
permit generalizations about the frequency of certain relationships. 

But dynamic analysis may well be promoted when it is applied in great 
detail to an individual case or a few cases." 

By far the most important method of economic psychology is the 
sample interview survey. Rensis Likert and his associates have de- 
veloped over many years interviewing methods which have proved to 
be applicable to and fruitful for economic research. Only a few of the 
main characteristics of the methods will be mentioned here:” 

(1) A representative sample is drawn from a given universe. 

(2) Specific questions are formulated so that they can be asked from 
all respondents in a uniform manner and can be answered by them in 
their own words, expressing shades of opinion and degrees of certainty 
or uncertainty, and giving reasons for the opinions or attitudes they 
have. 

(3) Interviewers are carefully trained so as to be able to conduct the 
interview in a conversational way and to establish good rapport with 
the respondents. Personal financial information may be obtained if 
the respondents understand the issue in question and the significance 
of the interview. 

(4) Coding techniques are developed for the quantification of 
opinions expressed in the respondent’s own words and analysis tech- 
niques are worked out that yield objective checks of the survey data. 

These methods have been used in the economic field to achieve three 
objectives: to collect information concerning attitudes, motives, plans, 

10 “Psychological Analysis of Business Decisions and Expectations,” American Economic Review, 
XXXVI, 194€, pp. 44 ff. 

1 For instance, during the war, at the Cowles Commission for Economic Research of the University 
of Chicago, case studies were carried out concerning adherence to and evasions of price controi, and the 
reasons for either type of action. Cf. G. Katona, Price Control and Business, Principia Press, Blooming- 
™ a aspects of these methods, that were used in the surveys enumerated in the first footnote 
to this article, are discussed in the May 1946 issue of the Journal of Social Issues (edited by Angus 


Campbell). Concerning sampling, reference should be made to the paper of Roe Goodman in this issue 
of the Journal of the American Statistical Association and to the Federal Reserve Bulletin of June 1947. 
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intentions and expectations; to collect micro-economic data on the 
distribution of income, savings, and liquid asset holdings; and to deter- 
mine the relation of attitudinal and financial data. 

The application of these methods will be illustrated by discussing 
one major problem of several recent surveys. This problem can be 
stated either in economic or in psychological terms. Briefly, for econo- 
mists the issue is as follows: Before the war consumers saved about 10 
per cent of their disposable incomes; during the war over 25 per cent; 
how much will they save after the war—the same as during the war, 
the same as before the war, or much less because of using up their war- 
time accumulations of savings? The same idea inay be expressed in 
psychological terms as follows: Did people develop during the war a 
habit of saving which will be carried over to the postwar years, or was 
the wartime rate of saving due to special attitudes that have disap- 
peared or were even reversed with the end of the war? 

Several different approaches have been tried out to obtain informa- 
tion on this issue. Since human behavior is complex and multiple 
motivation is the rule, not the exception, little reliance can be placed on 
one or a few direct questions. 

First, why did people save during the war at a much greater rate 
than before the war? Some experts have answered this question by 
saying that wartime saving was forced saving. The quantity of goods 
available was restricted, and therefore people were unable to spend the 
same portion of their income as before the war. Or, in a slightly differ- 
ent formulation: A part of one’s income is customarily earmarked for 
the purchase of durable goods; they were not available during the war 
and people saved to have money for purchasing them when they should 
again be available after the war. 

When the people themselves were asked, in surveys conducted both 
during and after the war, they did not give this answer. They said, 
mostly, that they were saving to be protected against future con- 
tingencies—for a rainy day, for old age, for their children. Evidence 
indicated further that patriotism and personal solicitation—through 
the payroll deduction plan, for instance—also acted as powerful stimu- 
lants of saving. 

As is well known, however, motives are elusive. What are the chances 
of obtaining in a short interview information on real motives? No 
doubt, in some instances at least the chances are not too good. In spite 
of the survey findings the question concerning the real motives of war- 
time saving is not solved. But for our purposes it does not matter 
much whether or not we discovered the real deep-seated motives of 
wartime saving. The kind of motives and purposes of saving of which 
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people are aware at present, and of which they speak repeatedly, do 
influence their current and future behavior even if they happen to be 
rationalizations. What matters most is that people in general did not 
feel during the war that they were compelled to save and do not have 
this opinion at present. On the contrary, they said they were happy to 
have been able to save, they believed and still believe that saving is 
very important, and they desire to continue to save. 

The second problem studied was: Who saved during the war? Many 
more people than before the war. In 1945, for instance, over two-thirds 
of the Nation’s spending units, as against about one-third in 1935-36. 
Yet, savings were highly concentrated during the war. In 1945 10 per 
cent of the spending units, those 10 per cent who saved most, ac- 
counted for over 60 per cent of the Nation’s net savings. The concen- 
tration of current holdings of accumulated bank deposits and war 
bonds is similar, and in 1946 one-quarter of the spending units had none 
of these assets and another quarter only insignificant amounts of such 
assets. 

Therefore, from the point of view of their economic effects, we are 
not concerned with the saving or dissaving decisions of all people. The 
question is, will the large holders of liquid assets use much, some, or 
none of these liquid assets, and will the large savers continue to save 
or will they reduce their rate of saving? 

Third: How do people, and especially the large holders, regard their 
liquid assets? War bonds and bank deposits are called liquid because 
they can be cashed or withdrawn any time and used for consumption 
expenditures. People, however, so the surveys showed, do not regard 
them or most of them as liquid. They regard their war bonds and sav- 
ings deposits, many even their checking accounts, as permanent posses- 
sions to be kept for security, or to be spent for purposes that enhance 
their security—such as buying a home, or investing in business. Most 
people feel opposed to spending their liquid assets for consumption and 
consider even buying a car or other durable goods as inappropriate uses 
of their wartime savings. 

Fourth: Does that mean that no liquid assets were being spent in 
1946 and that none will be spent in 1947? By no means. Many people 
used and will use some of their war bonds or bank deposits for living 
expenses, because of unemployment or strikes or decline in real in- 
come, and even for vacations and .uauries. Although for most individ- 
ual families these withdrawals were small, the sum total of these 
amounts represented a few billion dollars in 1946 and constituted a 
significant addition to consumer expenditures. 

In other words, some people used wartime savings because the in- 
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security against which they had saved had actually come upon them; 
some others used savings because only in that way could they satisfy 
their urgent needs. Not the “average attitudes” but the distribution 
of specific attitudes must be known, and must be confronted with 
micro-economic data (employment, income, liquid asset holdings of 
individual families), to understand the origins of economic effects. 

Fifth: The survey conducted at the beginning of 1946 showed that, 
despite the great hesitancy to use accumulated wartime savings, con- 
sumer expenditures in relation to consumer incomes would rise. First, 
people generally indicated a willingness to borrow, that is, to buy 
automobiles and other durable goods on installment. The fact that most 
people became free of debt during the war is of significance. 

Then, also, beginning with VJ-Day survey indications pointed to- 
ward a reduction of additional savings. While people generally were 
opposed to using accumulated assets for consumption purposes, they 
felt a much lesser reluctance to save a smaller part of their incomes. 
Saving less for buying durable goods was approved; saving less in order 
to maintain one’s level of living in spite of price increases—often not 
compensated for by wage increases in 1946—was likewise approved. 

A few years ago, during the war, the author heard a well-known 
banker predict that prices would rise by 100 or 200 per cent shortly 
after the end of the war. He argued that postwar purchasing power 
would consist (1) of the income earned while producing the goods con- 
sumed after the war and (2) of the accumulated wartime savings. The 
surveys cited here lead to a different inference. It appeared as early as 
January 1946 that the inflationary pressure may be relieved as soon as 
the supply of goods will be in balance with current incomes. Further, 
that an extended period of low rate of saving may be expected but not 
a period of net dissaving. People’s desire to hold on to accumulated 
savings and to continue to save is so strong that it is improbable that 
the use of accumulated assets will exceed new savings. These assets may 
have a beneficial effect, instead of an inflationary effect, by supporting 
purchasing power during the years to come. 

When in the fall of 1946 the author spoke in these terms before the 
National Industrial Conference Board an economist replied: “Formerly 
I learned that deficits are bad; now I hear that deficits are good because 
they create liquid assets.” This is, of course, an oversimplification. Not 
the events alone, but human reactions to the events are important. 
Huge deficits create liquid assets only if people in times of rising in- 
comes decide to save a large part of their incomes (as they did during 
the war). Liquid assets are beneficial for the economy only if people 
decide not to spend them in inflationary times but to part with them 
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gradually when deflationary tendencies threaten. It seems therefore 
that generalizations about the relation of economic magnitudes do not 
suffice; the problem consists of investigating the effect of changes in 
the economic environment on human decisions. 

In discussing the applications of survey research to economic anal- 
ysis we have presented only one side of the picture. It is necessary to 
add a few words of caution. The methods used are still in a develop- 
mental stage and the problems that are still unsolved are numerous and 
difficult. 

The basic question that must be raised is: are psychological data 
assembled by interviewing surveys reliable and useful indicators of 
economic behavior? More specifically: are attitudes stable and endur- 
ing, or are they changing so rapidly that they cannot shed light on 
future decisions and actions? 

The problem of the predictive value of attitudes, intentions and ex- 
pectations expressed by individuals is a complex one which can be 
clarified only by new investigations that are now under way. The 
probable solution can best be expressed by warning against certain 
possible misunderstandings. 

First, attitudinal data should be used only to supplement other 
financial data and not to supplant them. Information on motives, at- 
titudes and expectations helps in assessing past trends and ia predicting 
probable future trends, but cannot do the job alone. 

Secondly, the emphasis on attitudinal information should not imply 
that purely psychological theories of business cycles, inflation, defla- 
tion, ete., are in order. Psychological factors must be taken into account 
so as to understand these developments, but they are not autonomous. 
Attitudes and expectations that are enduring and powerful in framing 
actions do not arise without cause. They are intermediate variables, 
molding the understanding of economic events and their effect on 
people’s reaction, elicited by economic stimuli. If we oppose such 
formulations as “the propensity to save is a function of income” or 
“inflation is a function of changes in money supply,” we must not com- 
mit the mistake of arguing that propensity to save, or inflation, is a 
state of mind the origin of which is noneconomic. Inflation results from 
a set of subjective expectations that arise when people become con- 
vinced that certain given economic forces operate in a certain direction. 
Increase or decrease in the propensity to save depends upon a set of 
attitudes and expectations that arise from one’s understanding of the 
financial situation and prospects. Psychological factors and traditional 
economic factors are interwoven in one unified pattern and must be 
studied together to understand economic behavior. 
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THE PROBLEM OF PLOT SIZE IN LARGE SCALE 
YIELD SURVEYS 


P. V. SukHatMg, Px.D., D.Sc. 


Corrections to tables published in Volume 42, No. 238, 
June 1947 


Table 2 


column 8, row 9: Under number of plots, read 24 in place of 23. 

column 9, row 2: Under average yield, read 626.2 instead of 625.2. 
column 9, row 9: Under average yield, read 606.4 instead of 632.8. 
column 11, row 9: Under average yield, read 620.4 instead of 612.2. 


Table 3 


row 1: Substitute the following for the figures shown: 
37.0, 20.6, 77, 1.83, 58.4, 14.8, 106, 3.87 
row 2: Substitute 69.1 instead of 69.9; 106 instead of 104; 4.51 instead of 4.52 


Table 4 


2nd column, 16}’ A, row 4: Read 687 instead of 686. 
2nd column, 3 ’ ©, row 2: Read 703 instead of 683. 
row 8: Read 528 instead of 529. ( 

row 10: Read 560 instead of 561. 

2nd column, 2 ‘©, row 10: Read 608 instead of 610. 


Table 5 


column 8, row 3: Read 1472.9 instead of 1472.8. 
column 14, row 2: Read 8.8 instead of 8.6. 


Table 7 


column 14 under Kaikalur, row 2: Read 1537.5 instead of 1637.5. 
column ‘t’ should read: 


.67 
2.17* * Significant at 5% level. 
2.40* ** Significant at 1% level. 
5.28** 


Table 8 


column 5, row 2: Under Kaikalur read 1537.5 instead of 1637.5. 


CORRECTION TO THE CITATION OF 
WILLIAM G. MADOW c 


In the citation of William G. Madow as Fellow of the Association 
which was printed in the March 1947 Journal, Volume 42, No. 237, 
he was incorrectly listed as United States Consul, Sao Paulo, Brazil. 
We regret the error. Dr. Madow is Visiting Professor in Mathematical 
Statistics at the University of Sdo Paulo. 
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BOOK REVIEWS 


Edited by 
Oscar KrisEN Buros 
Rutgers University 


Measuring Business Cycles. Arthur F. Burns (Director of Research; Professor 
of Economics) and Wesley C. Mitchell (Director of Research, 1920-1946; Pro- 
fessor Emeritus of Economics). (National Bureau of Economic Research; Colum- 
bia University). Studies in Business Cycles, No. 2. New York 23: National 
Bureau of Economic Research, Inc. (1819 Broadway), 1946. Pp. xxvii, 560. $5.00. 


Review By Leonip Hurwicz 
Associate Professor of Economics, Iowa State College 


N REVIEWING a book of the caliber of Measuring Business Cycles one feels 
| a sense of responsibility. Not only is it the result of years of painstaking 
research work and scholarship, but it also is a forerunner of other similar 
studies. (Cf. p. 22.) Moreover, the lucid argument, as well as the prestige 
of the authors and of the National Bureau of Economic Research, is likely 
to make the book an outstanding influence in the field of study of economic 
fluctuations. 

Under these circumstances it becomes the reviewer’s duty to try to evalu- 
ate the basic postulates of the authors’ approach. If the results 0’ such evalu- 
ation appear excessively negative, let this be interpreted as symptomatic of 
the desire to stress what in the reviewer’s opinion are among the fundamental 
problems to be faced by the science of economics. 

The book’s point of departure is a definition of the object of study: 

Business cycles are a type of fluctuation found in the aggregate economic 

activity of nations that organize their work mainly in business enterprises: 

a cycle consists of expansions occurring at about the same time in many eco- 

nomic activities, followed by similarly general recessions, contractions, and 

revivals which merge into the expansion phase of the next cycle; this se- 
quence of changes is recurrent but not periodic; in duration business cycles 
vary from more than one year to ten or twelve years; they are not divisible 
into shorter cycles of similar character with amplitudes approximating their 
own. 
It is immediately pointed out by the authors that the definition is a tool 
of research, “subject to revision and abandonment if not borne out by ob- 
servation” (p. 3). 

Given the object of study, the authors’ stated task is to carry out certain 
measurement procedures which will provide quantitative empirical knowl- 
edge of a phenomenon qualitatively described by the definition. There is 
undoubtedly great need for empirical quantitative information with regard 
to economic phenomena. One has only to glance at a “business cycle theory” 
textbook to see how many controversies could be solved with the help of 
such information. While a great many studies have been devoted to various 
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segments of the economy, not many have had the scope of the Burns-Mitch- 
ell book. 

The authors state that Measuring Business Cycles is not an attempt to 
choose among alternative economic theories, although they express the hope 
that their results may serve as a basis for such choice in the future. (Cf. pp. 
8-10.) 

The chief objective is to obtain “measures” of certain characteristic fea- 
tures—timing, duration, amplitude—of the phenomenon under study. Once 
the measures of these “cycle! characteristics” are available, it becomes 
possible to examine their changes over a period of time. (Chaps. 10-12.) The 
changes considered are of secular, cyclical, discontinuous, or erratic nature. 

To appraise such a procedure, let us suppose at first that a cycle is a well- 
defined phenomenon, characterized by the values of several variables (e.g., 
duration and amplitude), and that we are dealing with a sequence (more 
generally, a set of parallel sequences) of such phenomena. (The desirability 
of such an approach is discussed below.) Thus, we are dealing with a sequence 
of the type 


{ (a, 22, 2 , 2x’), (x1?, r2”, 7 ie 2K), eek Ga”, x2, iti , tx)} 


where z;‘ is the value of the j-th cycle characteristic in the i-th consecutive 
cycle; for instance, if duration is chosen as the first cycle characteristic, 2, 
represents the duration of the second cycle. 

The book’s last three chapters are devoted to the question of changes over 
time (i.e., as ¢ increases) in the various cycle characteristics. Thus, choosing 
Z, aS an example, the task is that of examining the behavior of the sequence 


fz, m3, +++, 21") 


of values of the cycle characteristic z, (duration) for the successive cycles 
1,2,---°,WN. 

A number of hypotheses with regard to the properties of such sequences 
are tested. The fact that such tests are made (and standard statistical tech- 
niques such as the analysis of variance applied) reveals a point that is not 
sufficiently well emphasized in the book, namely, that the object of the 
study—the “business cycle”—is a phenomenon characterized by a probabil- 
ity distribution.? 

Thus, one of the tests carried out is essentially that of a hypothesis that 
all the z,‘ come from the same (normal) population; the set of all admissible 
hypotheses specifies (somewhat arbitrarily) that the elements of each of the 
triplets (21, 21, 21), (%1, 21, 21), - - - come from the same population while two 
different triplets may differ with regard to the means of their universes. 


1In the book’s terminology specific cycles in different variables are to be distinguished from (and 
compared with) the reference cycles. 

2 The stochastic nature of the cycle characteristics is also implied by references to “random factors” 
and “disturbances.” 

3 The authors indicate that they are aware of the inapplicability of some assumptions (normality, 
independence) underlying the tests performed. Occasionally nonparametric tests are used (e.g., p. 482); 
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Hence, implicitly, it is postulated, at least as a first approximation, that 
z*is a chance variable of the form 
; i i 

n=hitnu 
such that the w’s are independent with mean zero and have a distribution 
independent of 7; the hypotheses of Chapters 10—12 refer to the nature of 
variation of £,* with «. Among the hypotheses tested we find that of a trend, 
step function (with the location of the hypothetical steps contributed by 
outside knowledge of the economic system, e.g., the case of wars or industrial- 
ization stages, as conjectured by Mills), and long-period cycle (as postulated 
by Kondratieff and others). 

The possibility that the cycle characteristic sequences might be character- 
ized by N distinct £,’s or by other phenomena of complete lack of statistical 
stability‘ is of legitimate concern to the authors; in such a case the average 
(over i=1, 2,---, N) would be “futile if not mischievous constructions” 
(p. 466). 

Another interpretation of the meaning of instability is the large size of 
the variance of u, the random component of the variations over time of the 
various cycle characteristics. The tests of the hypotheses of the book’s last 
few chapters are designed to allay the fear that such instability is present 
to an extent that would deprive the averages of the cycle characteristics of 
the meaning attributed to them. In addition to the evidence presented in the 
book, reference is made to later monographs which “will demonstrate . . . 
that business cycle phenomena are far more regular than many historically- 
minded students believe,” although the “processes involved in business 
cycles behave much less regularly than theorists have been prone to assume” 
(p. 491). 

Unfortunately it is not easy to see just exactly what hypotheses are being 
tested against what alternatives. While the procedures used imply a proba- 
bilistic nature of the phenomena under study, a rigorous statement of just 
what distributions are involved is nowhere to be found. The tools, terminol- 
ogy, and notation provided by modern statistical inference are hardly used. 
As a result, it is at times difficult to tell not only what is tested and what 
assumed but also whether the tests used have the required properties of 
consistency and power. 

So far the reviewer has gone along with Burns and Mitchell in accepting 
the study of the cycle characteristics and their sequences as a desirable ap- 
proach in investigating the time behavior of economic phenomena. It is now 
appropriate to examine this premise. Is it really necessary first to subject the 
original economic series to various “processing” procedures until such mag- 








otherwise comfort is sought in the fact that “even rough tests of statistical significance used with re- 
serve and discrimination serve this limited purpose” (p. 392). At times the fact seems to be overlooked 
that if, for instance, the assumption violated by the data is that of independence of successive observa- 
tions, one could avail oneself of the existing theory covering tests applicable to serially correlated data. 

‘ A related problem is that of correlation between two economic series with regard to a given cycle 
characteristic. (Cf. p. 481.) 
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nitudes as amplitude and duration are obtained and then estimate and test 
the various properties of those magnitudes? Clearly, since the cycle charac- 
teristics are functions (though not always unambiguously defined) of the 
original observations on the economic variables (prices, quantities produced, 
etc.), any hypothesis concerning the cycle characteristics is expressible in 
terms of the original economic variables. (The converse, however, is not nec- 
essarily true.) 

It would seem that the proper point of departure of the whole study would 
have been to state both what is assumed and what is to be tested or esti- 
mated in terms of the joint probability distribution of the original economic 
variables. The hypotheses may, of course, be of the nonparametric variety 
and need not be any more restrictive than whatever is implicitly postulated 
by Burns and Mitchell. 

Once it is known what the set of admissible hypotheses is and what type 
of statistical decision is to be made (estimation, hypothesis testing, choice 
among several hypotheses), the choice of appropriate statistical procedure 
to follow is no longer arbitrary. The general principles of statistical inference 
(especially the principle of minimizing the maximum risk, with maximum 
likelihood estimation and likelihood ratio tests as important special cases) 
provide the guidance, although in certain cases the mathematical difficulties 
involved in applying the principles are of a serious nature. 

It is conceivable that certain types of stochastic models might lead to 
procedures resembling the cycle characteristic approach. Until this has been 
shown to be the case, the desirability of this approach must be regarded as 
undemonstrated. In fact one can make a stronger statement: For an impor- 
tant class of models, that of the stochastic difference equation systems, the 
approach is statistically inefficient and probably also biased. This does not 
mean that for some other class of models the situation might not be different; 
but the burden of proof is on the proponents of the cycle characteristic ap- 
proach. The reviewer doubts that such a class of models can be found. 

It is important to realize that the phenomenon of the business cycle postu- 
lated in the volume’s fundamental definition might very well have been 
generated by a system of stochastic difference equations. Since the properties 
of such systems have been studied quite extensively and have also been used 
for interpretation of economic fluctuations, it would seem legitimate to ask 
how the Burns and Mitchell approach compares with certain available al- 
ternatives when the true phenomenon is of the stochastic difference equation 
type. 

For simplicity’s sake, consider the case of a second order difference equa- 
tion 
(1) a: = [aro + and(t) Jara + ordre + a + 0%. 

Here a, is the value of some economic variable (say national income) at time 


t; v is the nonautocorrelated disturbance, assumed to have the normal 
distribution with mean zero and a constant variance; the a’s are parameters, 
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usually unknown. When ay =0, the series {a:, az, - - - } will be of a quasi- 
periodic (“cyclical”) nature. When a, ~0, an appropriate choice of ¢(¢) 
will make the amplitude and quasi-period (cycle duration) vary in a trend- 
like, stepwise, or long-cycle fashion. 

Now the Burns-Mitcheli approach would require that the series 
{a;, @2, - - - } should first be decomposed into individual cycles and each cy- 
cle into the typical nine stages; in this process a great many decisions with 
regard to timing, etc. would have to be made. The principles on which such 
decisions are based, in addition to often being ambiguous,' are of a somewhat 
arbitrary and mechanical nature. Once the “processing” of the series has been 
completed (no attempt is made here to describe the procedure in detail), it is 
possible to compute for each cycle its amplitude, duration, and other charac- 
teristics. If suspicion of, say, trend exists, the sequence of the amplitudes 
would be tested for trend as though it were a random sample from a normal 
universe. Then the sequence of cycle durations would be subject to a similar 
test, and possibly other characteristics as well. 

On the other hand the likelihood ratio test for presence of trend in equation 
(1) would be essentially the test of the composite hypothesis a,, =0. There 
would be only one such test, it would simultaneously extract the informa- 
tion from every observation in the original time series a; (including the 
relevant variation in amplitude and quasi-period without either one being 
explicitly computed), and this would use the sample information in a more 
efficient manner; moreover, no two investigators using the same significance 
levels could disagree as to outcome, which could scarcely be claimed for the 
book’s method. 

Similarly, if the task is that of estimation, there is only one way of obtain- 
ing the maximum likelihood estimates of the parameters in equation (1).° It 
may be added that the proofs of optimal properties of the maximum likeli- 
hood estimates are already available for a large class of stochastic difference 
equation systems. 

When it comes to considering the nonparametric types of hypotheses, 
statistical theory is not quite so well developed ard a good deal of work 
remains to be done. But even there important results are already at the 
economist’s disposal. 

Whether the model implicit in the Burns-Mitcheil method is of the type 
described by equation (1) cannot easily be answered. The authors perhaps 


5 “Our methods of determining specific cycles make no pretensions to elegance. Since no fast line 
separates erratic or episodic movements fren specific cycles, or erratic turns from cyclical turns, there 
is ample opportunity for vagaries of judgment. At times our rules failed to yield a clear-cut decision. At 
times the members of our statistical staff disagreed in their efforts to apply the rules to a given series” 
(p. 64). 

* It might be interesting to construct a set of series of the type described by equation (1) and subject 
each series to the process of timing, duration, and amplitude measurement, etc. followed by the authors. 
Then the tests of presence of the secular component could be made both by the methods of the book 
and by the likelihood ratio method based on the original series. The quality of estimates of the quasi- 
period and other cycle characteristics obtained by the book’s method and from the maximum likelihood 
criterion could then also be compared. 
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think of their model as less restrictive. The reviewer’s inclination is to doubt 
this, but no rigorous solution is possible until the underlying model is ex- 
plicitly given. As it is, there is a very serious danger of the type the authors 
are most anxious to avoid: that of influencing the results of the empirical 
studies by assumptions hidden in the techniques used. 

To avoid misunderstanding, it may be well to say that the reviewer’s 
main objection is not to the particular method used, nor even primarily to 
the method’s shortcomings such as occasional arbitrariness, or possible 
inefficiency or inconsistency. The principal objection is to the lack of com- 
mitment, in advance of looking at the data, to the principles of drawing 
inferences from the data. Modern statistical theory provides such a set of 
principles; these are imperfect and perhaps not unique. But until some com- 
mitment with regard to principles of inference exists, it is impossible to 
evaluate the reliability of whatever conclusions have been reached. 

On the other hand, if the basic statistical inference principles are accepted, 
one cannot get around the need for rigorous formulation of all hypotheses in 
terms of the relevant classes of probability distributions. But this does not 
mean that in order to proceed with, say, estimation of the joint distribution 
of the observed variables (this includes any interesting characteristic traits, 
e.g., the quasi-period, of the observed series) One must accept the picture of 
the economic fluctuations provided by a particular school of thought. All 
that is necessary is to have the list of relevant variables (this is also true of 
the authors’ approach) and to know the class of probability distributions of 
the observed variables (a stochastic difference equation system with non- 
autocorrelated disturbances is an example of how such a class can be de- 
fined). Thus, for the purpose of dealing with the past probability distribution 
of the observed variables, it is unnecessary to make assumptions with regard 
to, say, the effect of the rate of interest on the demand for investment. 

One may conjecture that Burns and Mitchell, in their desire to avoid being 
influenced by the (unnecessary) assumptions of the latter type (describing 
individual economic relations), have been carried one step further and have 
eliminated explicit rigorous treatment of the (necessary) assumptions of the 
former type (defining the class of admissible probability distributions). 
Burns and Mitchell are by no means the only ones to have done this, but 
they are among those who have put more emphasis on this aspect of their 
method; in fact it would seem at times that they regard it as a virtue rather 
than a weakness. If this is so, there is a fundamental methodological problem 
involved and it would seem highly desirable that it should be brought out 
into the open. 

In noting that it is unnecessary to make assumptions with regard to, say, 
the effect of the rate of interest on the demand for investment the reviewer 
inserted the qualifying phrase ‘‘for the purpose of dealing with the past 
probability distribution of the observed variables.” This qualification is 
important because it focuses attention on the difference between the less 
ambitious nature of the authors’ objective as compared with that of say 
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Tinbergen, or more recently, Klein, Haavelmo, and others. The objective 
of the latter group of workers is what has been called estimation (or hypo- 
thesis testing) of the economic structure, i.e., of the nature of relationships 
describing the behavior of certain groups within the economy (consumers, 
investors, etc.). For the purposes of structural estimation, as distinct from 
estimation of the past probability distribution of the observed variables, it 
is necessary to make the so-called identifying assumptions. Such assumptions 
may take the form of postulating that certain variables are absent from 
specified structural relationships. Since structural knowledge is of the essence 
for many types of prediction (when structural change is anticipated) and in 
formulating policy recommendations, one should recognize that the authors’ 
success in avoiding the utilization of the theorist’s ideas is due to the more 
modest nature of their task. This is not to imply that this task is unimpor- 
tant; but the scornful reference to the theorist’s “dreamland of equilibrium” 
might have been omitted if it had been realized that the theorist’s contribu- 
tion is indispensable at a stage which Measuring Business Cycles does not 
reach. (Even when the objective is merely to estimate past distributions of 
the observed variables, the utilization of theoretical knowledge increases the 
efficiency of the estimates.) 

This review is, in a sense, highly unfair to the authors. It fails to give 
uniform coverage to the different matters treated in the volume. It over- 
looks the highly fascinating nature of various results. It does not put suffi- 
cient stress either on the book’s value as a collection of interesting and often 
hard to get data or on the stimulating and thought-provoking ideas with 
which Measuring Business Cycles is filled. Finally, it does not give an ade- 
quate picture of the careful and unbiased reasoning which characterizes the 
volume. 

But if the reviewer’s appraisal of the book’s contribution is correct, 
Measuring Business Cycles follows a line of attack which is not likely to 
produce results commensurate with the required input of time and resources. 
Moreover, further work along entirely different lines is necessary in order to 
establish which of the results are significant. 


Calcolo delle Probabilita: Vol. I, Fondamenti della Teoria: Applicazioni alla 
Statistica, alla Teoria degli Errori, alla Balistica ed alla Fisica, Third Edition. 
Guido Castelnuovo. Bologna, Italy: Nicola Zanichelli, 1947. Pp. xxvii, 321. 700 
lire. 


Review BY W. FELLER 
Professor of Mathematics, Cornell University 
HE first edition appeared in 1918 and was appreciated as a progressive 
book with serious mathematical tendencies. During the intervening 


29 years both the theory of probability and mathematical statistics have 
undergone a radical change. Some of the newer developments in the theory 
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of probability will find place in a second, more mathematical, volume. The 
progress in statistical methodology is not reflected in the present volume 
and not mentioned in the introduction. This first volume serves as an eie- 
mentary introduction more or less according to traditional lines: it barely 
touches the substance of the theory of probability and dwells on its classical 
trimmings. The theory of probability is represented by the elementary 
composition rules, the binomial distribution with the normal and Poisson 
approximations, the central limit theorem (without proof), the simplest case 
of the law of large numbers (Cantelli), and the multidimensional normal 
distribution. Stieljes integrals and characteristic or generating functions are 
not mentioned. 

More than half of the book is devoted to various applications. The statis- 
tical theory is of the pre-Pearson and pre-Fisher era when Bayes’ rule was 
accepted and Lexis’ dispersion theory was the height of sophistication. 
Furthermore, there are the customary chapters devoted to a “derivation” of 
the normal law, to the theory of errors, to ballistics, and to the Maxwell dis- 
tribution of velocities. The exposition is clear and concise and makes the 
reader understand the original merits of the book. 


An Index of Mathematical Tables. A. Fletcher (Lecturer in Applied Mathe- 
matics), J. C. P. Miller (Lecturer in Applied Mathematics), and L. Rosenhead 
(Professor of Applied Mathematics). (University of Liverpool, England.) London, 
W.C. 1: Scientific Computing Service Ltd. (23 Bedford Square), 1946. Pp. viii, 
451. 75s. (New York 18: McGraw-Hill Book Co. Inc. (330 West 42nd St.), 1946. 
$16.00.) 


REVIEW BY CHURCHILL EISENHART 
Chief, Statistical Engineering Laboratory, National Bureau of Standards 


HIS is an index of mathematical tables. Tables of experimentally determined 
y presenta such as chemicaland physical constants, mortality experience, 
etc., have almost always been excluded. An exception is tables of random 
numbers, to which there are several references. Its value to a statistician, 
therefore, will be in direct proportion to the extent to which the performance 
of his duties calls for the use of mathematics above simple arithmetic. If he 
is concerned principally with the collection of data by interviews, question- 
naires, or schedules and their presentation in tabular or graphical forms, 
he will have little occasion to refer to this work. If on the other hand his 
work involves the fitting of growth curves, regression formulae, and trend 
lines to empirical data, and especially if he utilizes the mathematical theory 
of probability to determine with what confidence predictions can be made 
when data are collected in a specified way and analysed in a prescribed 
manner, he will find this Index a very useful—practically indispensable— 
reference work, one that he will expect his library to have on hand for ready 
reference if he does not purchase a copy for his own use. 
The Indez contains the following two main divisions: Part I, pages 18-372, 
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Index According to Functions, and Part II, pages 373-444, Bibliography. 
An index to Part I follows Part II, pages 445-451. Following the title page, 
the book opens with a foreword (p. iii) by Professor D. R. Hartree, FRS 
(Professor of Physics, Manchester University); a publisher’s preface (p. iv) 
by L. J. Comrie (Director, Scientific Computing Service Ltd.); two authors’ 
prefaces by Professor Rosenhead (p. v) and Drs. Fletcher and Miller (pp. 
vi-vii); a table of contents (p. viii); and a clearly articulated introduction 
(pp. 1-15), from which much of the material in the present review has been 
extracted. 

Work on the Index began in 1939. Publication was intended for early in 
1944 but was delayed two years by unsurmountable difficulties besetting 
printers in the United Kingdom. On this account the book was revised in 
proof to take account of new material to the end of 1944. Since it was neces- 
sary to draw the line somewhere, everything published after the end of 1944 
was deliberately ignored. 

While the Index According to Functions, Part I, is the cardinal feature of 
the Index, the authors have gone to considerable trouble to make Part II, 
which gives an alphabetical list of references used in Part I, as complete and 
reliable as possible. They have included in Part II a considerable number of 
books on graphical and numerical methods, nomography, probability, and 
statistics in the wider sense not utilized in Part I. For this latter reason, and 
since understanding of Part I depends upon familiarity with Part II, it will 
be convenient and advantageous to describe Part II somewhat more fully 
without further delay. 

Part II is a complete list of the published material referred to in Part I, 
with some additional references, arranged alphabetically according to 
authors. Under each name the entries are arranged chronologically. When 
two or more works of the same authorship are published in the same year, 
they are distinguished by letters a, b, . . . , placed after the year, e.g., Pearson 
1920a, Pearson 1920b; but the practice sometimes followed is exemplified by 
Pearson 1902, Pearson 1902a, Pearson 1902b—an explanation of the exis- 
tence of these two different practices is given in the Introduction (p. 7). The 
authors have been careful to indicate, by means of a prefixed asterisk, when 
a particular item has not been seen, so that information given with regard to 
it is secondhand. Only slightly more than 300 of the over 2000 entries in 
Part II are marked by asterisks. 

In the case of books or pamphlets brief titles are given, but the titles of 
journal references are rarely given—only other bibliographical details. This 
absence of titles in the case of journal references undoubtedly saved much 
space, but the present reviewer has found these omissions quite tantalizing, 
in some instarces to the point of annoyance. This latter consequence could 
be largely obviated in subsequent editions by cross-referencing to the section 
or sections of Part I in which the item concerned is mentioned. The city of 
publication is almost always given for books and pamphlets; and for those 
published comparatively recently, the name of the publisher also. Whether 
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the authors have seen a particular item or not, they have always indicated 
the year of publication of the last edition known to them and have tried to 
give the year of publication of the first edition; but often a very different 
date, that of the edition actually consulted, may be the year in the reference. 
References to old tables have been included only when they seem to be still 
of interest. 

In the case of articles and tables published in periodicals, the authors 
have given the actual year of publication of the item concerned whenever 
possible but in some instances were obliged by lack of information to use the 
date given on the title page of the completed volume, although the part 
containing the relevant item may have been published in an earlier year. In 
this connection they remark that “possible errors of one year arising in this 
way were eliminated as far as was possible without undue expenditure of 
effort, but a number must remain. ... even at the present time there are 
periodicals which give information about the date of publication which is 
inadequate or, worse, seriously misleading” (p. 8). (It has always been a 
source of mystery to the present reviewer why the editors of so many jour- 
nals published in this country are resistant to the practice of indicating in a 
footnote on the first page of each article the date on which it was received 
for publication.) 

As noted above, Part II includes various special statistical tables not 
indexed in Part I plus “a considerable number of books on probability and 
statistics in the wider sense. These range from treatises on fundamentals 
down to elementary works concerned mainly with applications to education, 
psychology, medicine and other subjects; some of the latter owe their in- 
clusion to an accurate table of the error integral. Books on probability of an 
entirely non-mathematical kind have normally been excluded.” The present 
reviewer counted approximately sixty books on probability and statistics 
in the wider sense available in English, plus about ten in foreign languages. 
In the main these books can be located only by thumbing through Part II, 
but some of the better known and more standard works are listed on pages 
14-15 of the Introduction. 

In Part II there are numerous references to Biometrika, some references to 
the Annals of Mathematical Statistics, and at least one reference to Psycho- 
metrika and Sankhyd. The reviewer recalls no reference to this JouRNAL, nor 
to Econometrica, nor to the Annals of Eugenics—fertile sources of special 
mathematical tables useful in the development of statistical theory or the 
application of statistical methods. Hence, the owner of the Jndez will find it 
advantageous to consult Mathematical Tables and Other Aids to Computation 
(MTAC), published quarterly by the National Research Council, Washing- 
ton, for references to the statistical tables of recent vintage and of the more 
technical kinds. 

While books on nomography and other graphical methods have not in 
general been included in Part II, those which have may be located, along 
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with books on numerical methods generally, by consulting pages 13-14 of the 


Introduction. 


Let us now consider the arrangement and scope of Part I, which is the 


index proper or “where-to-find-it” portion of the present volume. Part I is 
divided into 24 sections, each devoted to tables of a particular group of 
functions. Each section is composed of a number of articles, and each article 
deals with tables of one function or of a strictly limited group of very closely 
related functions. The order in which the items are listed in an article is 
normally that of the number of decimal places or number of significant 
figures given. Normally each item occupies a single line and gives, in order 
from left to right, information about (a) number of decimals or figures, (6) 
interval and range of argument, (c) facilities for interpclation, and (d) 
authorship and date, in the form used in Part II. The authors have distin- 
guished carefully between numbers of decimal places and numbers of sig- 
nificant figures and have been remarkably successful in making their descrip- 
tions of numbers of decimals or figures as accurate as possible without 
sacrificing conciseness: “if a mainly 10-decimal table has a small proportion 
of 9-decimal values, we normally use ‘9-10 dec.’ or ‘about 10 dec.’ but if the 
proportion is very small indeed we may say simply ‘10 dec.’” (p. 9). When 
all the values given for a function are exact, this is usually stated. 


There are two valuable features of Part I which deserve special notice. 


First, in conjunction with each article listed in Part I, the authors have 
indicated the errata known to them. When the requisite corrigenda are few 
and short, these are given in the text; when they are numerous or otherwise 
extensive, references are given. Second, where a number of tables of one 
particular function are listed, those which are outstanding tables of their 
kind have frequently been indicated by printing the names of their authors 
and dates in boldface type. With regard to these the authors state: “The 
tables so emphasized are those which a computer in this country, possessing 
only the more important foreign tables, would probably regard as standard. 
... These tables are so much used that most of them have ‘reputations’ 
of some kind in computing circles; in some cases the results of detailed 
examinations for errors have been published. This has made the selection of 
tables as outstanding much easier, since we ourselves have naturally been 
unable to undertake any extensive examination for errors. . 
particular regard to what we believe to be the accuracy of tables selected for 
emphasis, though naturally a very extensive table, particularly if it is old, 
may contain a few errors. It is evident that the process of selection is not 
completely objective, and we have had a little doubt whether bold type 
should be used at all, but on the whole we think that users of the Index will 
welcome some indication, however imperfect, when they are confronted 
with a long list of tables” (pp. 11-12). Statisticians will find these designations 
of outstanding tables particularly valuable in connection with tables of 
areas and ordinates of the normal distribution (error function) and tables of 
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the percentage points of the normal, F, and ¢ distributions. Curiously, no 
table of the x? distribution or of its percentage points has been singled out 
for emphasis by the device of boldface type. Perhaps this is merely an over- 
sight, or perhaps they concur with the present reviewer that new tables of the 
x? distribution are needed, expressed in terms of x?/n, where nis the “number 


of degrees of freedom.” 
The twenty-four sections of Part I bear the following general titles: 


1) Primes, Factors, Products and Quotients 

2) Powers, Positive, Negative and Fractional 

3) Factorials, Binomial Coefficients, Partitions, etc. 

4) Bernoulli and Euler Numbers and Polynomials. Sums of Powers and of 
Inverse Powers. Differences and Derivatives of Zero, etc. 

5) Mathematical Constants, z, e, M, y, etc.; Multiples and Powers. Roots 
of Algebraic and Transcendental Equations. Miscellaneous Constants. 
Conversion Tables 

6) Common Logarithms, Antilogarithms, Addition and Subtraction Loga- 
rithms 

7) Natural Trigonometrical Functions. Miscellaneous Functions connected 
with the Circle and the Sphere 

8) Logarithms of Trigonometrical Functions 

9) Inverse Circular Functions. Trigonometrical Functions of Two or 

Three Arguments (including Products, Solutions of Plane and Spherical 

Triangles, etc.). Sexagesimal Interpolation Tables 

10) Natural and Logarithmic Values of Exponential and Hyperbolic Func- 

tions 

11) Natural Logarithms of Numbers and of Trigonometrical Functions. In- 

verse Hyperbolic Functions 

12) The Gudermannian, Combinations of Circular and Hyperbolic Func- 
tions, Circular and Hyperbolic Functions of a Complex Variable 

3) Exponential and Logarithmic Integrals, Sine and Cosine Integrals, ete. 

4) Factorial or Gamma Function, Psi Function, Polygamma Functions, 
Beta Function, Incomplete Gamma and Beta Functions 

15) The Error Integral, Higher Integrals, Derivatives. Hermite Poly- 
nomials and Functions. Moments 

16) Legendre Functions 

17) Bessel Functions of Real Argument 

18) Bessel Functions of Pure Imaginary Argument, or Modified Bessel 
Functions 

19) Bessel Functions of Complex Argument. Kelvin Functions 

20) Miscellaneous Bessel and Related Functions 

21) Elliptic Integrals, Elliptic Functions, Theta Functions 

22) Miscellaneous Higher Mathematical Functions 

23) Interpolation, Numerical Differentiation and Integration, Curve-Fit- 
ting 

24) Tables and Schedules for Harmonic Analysis, etc. 

The statistician will find here descriptions of, or references to, tables per- 

taining to the following: bivariate normal surface; Charlier series, Type A 

and Type B; chi-squared tables; correlation coefficients; correlation surface; 

curve-fitting (of frequency curves and regression formulae); Fisher’s z- 

distribution; Gompertz function; Hermite polynomials; incomplete beta 

function; incomplete gamma function; incomplete normal moment function; 
logarithmic growth curves; moments of the normal distribution; normal 
bivariate surface; normal moment function; normal probability distribution 

(univariate and bivariate); orthogonal polynomials (for curve-fitting); 

Pearson curves; Poisson distribution; probable error constant, probits; 
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random numbers; range (probability distribution of); rejection of doubtful 
observations; smoothing; Snedecor’s F; Student’s t; Student’s z; and tetra- 
choric functions. In most cases the index to Part I (at the end of the book) 
will lead one directly to the relevant sections of Part I. When this fails, pages 
13-15 of the Introduction should be consulted. 

So far as this reviewer has been able to determine, the Index does not 
discuss or list tables of, or relating directly to, the binomial distribution; 
experiment:i designs; power functions of various statistical test criteria; 
ranks, runs, and other characteristics of order (except the range in normal 
samples); or statistical functions arising in multivariate analysis (except the 
correlation coefficient). To offset these shortcomings, it is appropriate, I 
feel, to draw attention to these words of Professor R. C. Archibald: “... 
again and again the reviewer has been exasperated by the practical impos- 
sibility of finding out what has been tabulated in tables edited by Karl 
Pearson. Now he may, with a sigh of relief, turn to the Index and find out!”! 

The Jndez, the first work of its kind, is a “must” for every university 
library and for every research center where computing is carried on exten- 
sively. It is dedicated (p. v) “to the many scientists the world over who are 
working to build up a better civilization.” The extraordinary accuracy, 
comprehensiveness, and excellent typography of this work, compiled at a 
time when its authors and sponsor-publisher were also making their contri- 
butions to the war effort, are monuments to the meticulous care lavished 
upon it and must excite profound amazement and admiration on the part of 
anyone who has ever attempted something of a similiar nature. On account 
of the high cost of its production and the relatively few sales to be expected 
in view of its specialized nature, the primary publisher feels that he “faces no 
prospects other than financial loss, which he does calmly with the knowledge 
that he has served science by helping this work to see the light of day in its 
present dignified form. ... Naturally the Index will need revision in a few 
years... . The incentive to face revision and reprinting will come largely 
from the reception afforded to this first edition—and in particular from 
those who value the work sufficiently to have a copy on their own desks rather 
than rely on more remote library copies” (p. iv). 


The Advanced Theory of Statistics, Vol. II. Maurice G. Kendall (Statistician to 
the Chamber of Shipping of the United Kingdom, Bury Court, London). Lon- 
don W.C. 2: Charles Griffin and Company Ltd. (42 Drury Lane), 1946. Pp. vii, 
521. 50s.; abroad, including postage, 51s. 6d. (Available in the United States from 
Stechert-Hafner, Inc., 31 East 10th Street, New York 3, at $15 less 10 per cent.) 


Review BY E. J. GUMBEL 


Associate Professor, Brooklyn College 


I" REQUIRES a good deal of courage to write a book about statistics now, and 
even more so about its advanced parts, since the development of these 
methods and their expansion into neighboring fields has been extraordinarily 


1 Math Tables & Other Aids Comp. 2 (13): 17 Ja. '46. 
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quick in recent years. Old methods have been refuted, or their limitations 
discovered, and new refined methods have been constructed. Apart from the 
old application to economics and insurance mathematics, statistical pro- 
cedures have been introduced into biology, physics, engineering, and as- 
tronomy and have become the basis of new branches such as econometrics, 
quality control, and radioactivity. As far as the reviewer is aware, only once 
has an attempt been made, by R. von Mises in his German book, to cover 
all these fields. The next essay, Borel’s collection, can only be regarded as a 
proof of the impossibility of condensing this vast domain into a single book. 

Furthermore, there is no clear borderline between theoretical statistics and 
the calculus of probabilities, the latter being at the same time the basis and 
the crowning of statistics. Consequently an author of a book on statistics is 
faced with two dangers: either its outgrowing into a theory of probabilities or 
its degeneration into a series of kitchen rules. Kendall avoids sophistication 
and overempiricism. He does not attain the philosophical beauty and mathe- 
matical rigor of Cramér’s book, but he also avoids the purely empiristic 
attitude of letting statistics degenerate into a system of interpolation 
formulae ready for any use or abuse. Thus he succeeds in satisfying an urgent 
need. 

This valuable book, besides introducing new problems, reconsiders many 
topics of the first volume on a higher level. Although many references are 
made to the first part, the second volume is an independent book. The author 
avoids the usual term “modern statistics” and clearly states the relation of 
present-day problems and methods to the early work of Laplace, Gauss, and 
Cauchy up to Lexis. 

The reviewer regrets a certain logical discontinuity in the arrangement of 
the fourteen chapters. The first two chapters (17 and 18) deal with the 
estimation of parameters, starting with R. A. Fisher’s method of maximum 
likelihood. This treatment follows Hotelling. The study of confidence in- 
tervals (E. S. Pearson) and of fiducial inference (R. A. Fisher) leads to the 
different tests of significance (Chap. 21). This consequent line of thought is 
suddenly interrupted by Chapter 22 dealing with regression and by the 
analysis of variance (Chaps. 23 and 24), which have a connection neither 
with the preceding nor with the following chapters. The problem of signifi- 
cance, belonging to the previous line of thought, is taken up again in Chapters 
26 and 27, and the analysis of variance returns as multivariate analysis in 
Chapter 28. The reviewer believes that Chapters 17 to 21 dealing with 
estimation, inference, and significance ought to have been followed by 
Chapters 26 and 27 and that Chapters 23, 24, and 28 dealing with the analy- 
sis of variance belong together. 

In Chapter 21, on tests of significance, the probability papers now widely 
used by engineers might have been mentioned. However, this and other 
lacks must be accepted since completeness in such a rapidly expanding 
field cannot be reached, and it seems doubtful whether it is even desirable. 
The last two chapters dealing with time series are relatively short, probably 
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because the author has devoted a separate brochure to these problems. The 
important contribution of Stumpf on periodogram analysis might have been 
given more space. 

The essential advantage of the book is the clear logical arrangement 
within each chapter. Many problems are first stated in such a simple way 
that an intuitive answer is ready at hand. This is then confirmed, or refuted, 
by mathematical methods, a procedure highly recommendable from the 
pedagogical point of view. Cauchy’s distribution is frequently shown as an 
antithesis to the normal one, thus clearly demonstrating the limitations of 
the normal theory. After the mathematical treatment, numerical examples 
for the most important methods are given. Finally, exercises are chosen from 
the current literature summarizing a large body of research which could not 
be treated exhaustively. Thus, the book offers the right choice for both types 
of readers, for those interested in the advancement of research and for the 
practical statisticians who want definite prescriptions of how to act. 

The variety of numerical examples taken from economies, agriculture, 
biology and medicine may be accepted as a substitute for not treating 
separately the different independent statistical problems arising in these 
sciences. The index and the large bibliography are very valuable. A list of the 
graphs and tables would have increased the practical usefulness of this book. 


Regression Analysis of Production Costs and Factory Operations, Second Edi- 
tion. Philip Lyle (Tate & Lyle, Ltd., Sugar Refiners, 40 Berkeley Square, Lon- 
don, W.1). Edinburgh 1, Scotland: Oliver & Boyd Ltd. (Tweeddale Court), 1946. 
Pp. xii, 204. 16s. 
REVIEW BY Paut PEacH 
Associate Professor, Institute of Statistics 
University of North Carolina 


. book, addressed to accountants, engineers, and factory people, is 
intended to enable them to acquire, by home study, some facility with 
simple least squares techniques for making cost analyses and studying in- 
dustrial problems. Mr. Lyle discusses the fitting of the straight line, the 
plane, and the parabola, gives methods for estimating the standard errors 
of the estimates and the regression coefficients, and displays rather well 
the relationship between correlation and regression. 

Since Gauss’ time there have been many books on least squares, most of 
which now peacefully await the blast of Gabriel’s horn. A book must have 
vitality of no common order to survive in this lush jungle. Nevertheless, Mr. 
Lyle has recognized a genuine need for a work on regression addressed to 
industrial readers, and has kept before his mind’s eye the standards which 
such a book ought to meet. It should be modern, conformable to 1947 usage; 
it should be heuristic rather than rigorous; it should provide a basis for 
further study. 

On the whole, the work has been creditably executed. The notation is 
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reasonably familiar, and the language is straightforward. The treatment is 
more mathematical than I should have used, and while (as the dust jacket 
points out) the calculus is not mentioned, there is quite a bit of algebra which, 
being written into the sentence structure, tends to impede the reading. 
Nevertheless, the exposition is in some parts really excellent, and the dia- 
grams are likely to be especially helpful to students. 

Perhaps the highest praise I can give Mr. Lyle is to say that his book 
belongs definitely to the new order of statistical writing, and that any stu- 
dent who masters it will need to disencumber himself of very little deadwood 
if he decides later to study statistics seriously under good modern teachers, 
Unhappily, some deadwood is present, and the book’s errors are serious 
enough to detract materially from its net worth. 

For example, there is no development of the idea of randomness as a sine 
qua non for the application of least squares techniques. Mr. Lyle’s chief 
example of an exercise in fitting the straight line consists of a record showing 
total factory cost and total output for 40 consecutive weeks. He fits a 
straight line and calculates standard errors and fiducial limits using Stu- 
dent’s distribution. But his errors of estimate (col. 4, p. 13) are obviously 
not observations on any random variable with zero mean; instead, we have 
long series of plus errors, followed by series of minus errors. We can thus 
assert with some confidence that the 40 residuals do not represent drawings 
from a bowl, and cannot be used for estimating variances and standard 
errors. 

Mr. Lyle attempts, I think unwisely, to explain the notion of fiducial 
probability. After telling how fiducial limits are obtained, he says (p. 53), 
“These limits are called Fiducial Limits, and when they are calculated for 
the 5 per cent level we can say that the probability that the true or popula- 
tion value of the statistic lies between these limits is 95 per cent or 0.95. ...” 
This incorrect statement is almost certain to enlarge the penumbra of con- 
fused thinking on this subject which is, alas, already too extensive. The dis- 
cussion of degrees of freedom is not to my mind very illuminating or con- 
vincing, and the statement on page 92 that “The median is obviously a most 
inconvenient form of mean to use, particularly if we have a large number of 
items” might be disputed. 

There are a few infelicities of expression, though on the whole the book is 
a literate work. I personally prefer to use data as a plural noun. I don’t think 
the general subject of correlation “is discussed fully in Chapter III” nor in- 
deed in any other publication I can call to mind. The appendices on “Mean- 
ing of Regression” and “Meaning of Correlation” hardly live up to their 
ambitious titles, more’s the pity; for some competent discussion along these 
lines is desperately needed. Among other details, though, I must not forget 
a word of praise for the excellent typography and bookbinding. My main 
carp here is against the diagrams, which are bunched together in the back 
of the book instead of being scattered through the text. 

It may be fairest to appraise Mr. Lyle’s book, not as if it were something 
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irrevocable and final, but rather as the fruit of an attempt which has been 
partially successful, giving us hope of more ample success for the future. 
There is nothing wrong with the book that judicious revision and amplifica- 
tion will not remedy. The important fact is that Mr. Lyle has adopted a point 
of view which recognizes industrial statistics as a branch of applied statistics 
as we understand it today. The field wants such conceptual unification, and 
Mr. Lyle and his publishers have my own thanks for this contribution to 
that end. 


Statistical Analysis in Biology, Second Edition. K. Mather (Head of the Genetics 
Department, John Innes Horticultural Institution, London). Foreword by R. A. 
Fisher (Arthur Balfour Professor, University of Cambridge). London W.C. 2: 
Methuen X Co. Ltd. (36 Essex St., Strand), 1946. Pp. iv, 267. 16s. (New York 3: 
Interscience Publishers, Inc. (215 Fourth Avenue), 1947. $5.00.) Two reviews 
follow: 


REVIEW BY JoHN W. FERTIG 
Professor of Biostatistics, School of Public Health of the Faculty of Medicine 
Columbia University 


HE first edition of this book appeared in 1943 and was reviewed by Cochran 
Tin this JouRNAL, December 1944, pages 525-527. The second edition is 
practically identical with the first except that Chapter XIII on transforma- 
tions has been added. A brief account is given of the angular transformation 
for proportions together with an illustration. A fuller treatment is given of 
the probit transformation so widely used in biological assay work. 

The methods discussed in this book are essentially those presented by R. 
A. Fisher in Statistical Methods for Research Workers and in The Design of 
Experiments. The book is a very useful adjunct to those of Fisher and com- 
pares very favorably with similar books such as those by Snedecor, Goulden, 
and others. Most of the examples used to illustrate the various techniques 
are drawn from the fields of genetics or agronomy. This is not surprising in 
view of the experience of the author in these fields. The book might be of 
somewhat more general interest to biologists if the examples represented a 
broader coverage. 

The brief section on sampling distributions brings out admirably the 
relation between the normal, t, x2, and z distributions. The section on the 
analysis of variance and the planning of experiments should be interesting 
to most biologists. Of particular interest in this connection is the detailed 
presentation on the partitioning of the variation among the individual 
degrees of freedom. Chapter XI, on the analysis of frequency, gives a very 
interesting discussion of x? including general rules for partitioning it among 
the various degrees of freedom. 

In the application of the t-test to compare the means of two independent 
samples the author fails to mention the need for pooling the variation within 
samples when the samples are of unequal size. It seems to this reviewer that, 
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for a book limited in size as this one is, the space devoted to the higher order 
polynomials and the mechanics of fitting them is out of proportion to their 
utility in the analysis of biological data. There also seems to be an inordinate 
emphasis on alternative methods of computation. Frequently a lot of alge- 
braic manipulation is included to show the equivalence of various forms of 
computation. This is particularly apparent in the chapter on x?. The inac- 
curacies and ambiguities mentioned by Cochran in his review of the first 
edition have in general not been clarified. 


REVIEW BY Horace YY. Norton 
Meteorologist, U. S. Weather Bureau, Washington 


CCORDING to the author’s preface to the second edition, the only consider- 
A able change is the addition of a chapter of eighteen pages dealing with 
transformations (an omission from the first edition noted by Cochran). The 
probit transformation is treated fairly adequately, though there is no remark 
to guide the reader in case the slope of the probit regression is not signifi- 
cantly different from zero. The angular transformation is presented with a 
worked example, and the square-root transformation is barely mentioned; no 
other transformation is discussed. Transformation is advocated to achieve 
constancy of error variance and as sometimes giving a simpler relation to 
some other variate. There is no discussion of other desiderata for transformed 
variates, such as normality of distribution or additivity of effects. The para- 
graph (p. 259) dealing with the ease of planning of experiments which are to 
be analyzed by means of a transformation will give a misimpression to those 
readers who do not realize that “any required level of precision” will usually 
refer to the untransformed variate and that there is some difficulty in finding 
the corresponding level for the transformed variate when the mean values to 
be observed are unknown. 

This book is unusually lucid and should be widely used. Nevertheless, it 
deserves criticism for insufficient attention to form of statement, for there 
are many inaccuracies and inconsistencies and statements which are too 
sweeping or too loose. As examples of inaccuracies, there are the old error 
(pp. 35, 255) that kurtosis measures fullness of center and tails as compared 
to the normal distribution; the statement (p. 59) that the same result is 
reached by the ¢ of n degrees of freedom, with paired comparisons, as by the 
t of 2n degrees of freedom, if the paired observations are independent; and 
Section 58 on Fiducial Probability (pp. 231-233) which is based on sampling 
a normal distribution but contains several statements of the form of “ - 
with no resort to hypothesis ... .” 

The usual statement is made that a variance ratio is always the ratio of the 
larger to the smaller, but Table 17 shows a variance ratio of only 0.316, and 
its significance level is found by changing to ¢ (in the text) with no explana- 
tion of the departure from the stated rule. A fair sample of the sweeping 
statements is (p. 218) “Inefficient estimates . . . should never be used.” 


ATION 


‘order 
» their 
dinate 
f alge- 
rms of 
: Inac- 
e first 


\sider- 
x with 
). The 
omark 
ignifi- 
with a 
ed; no 
chieve 
ion to 
ormed 
para- 
are to 
those 
sually 
nding 
ues to 


ess, it 
there 
re too 
error 
pared 
sult is 
»y the 
}; and 
pling 


of the 
), and 
ylana- 
eping 


? 





BOOK REVIEWS 479 


There are many loose statements, such as (p. 30) “The normal curve is 
symmetrical and so the discrepancies cancel out” and “The... standard 
deviation is... markedly biased by grouping...” and that (p. 91) if a 
few observations are lost from an experiment “. .. its whole value will be 
forfeited.” Also the statements about tests of significance often use “signifi- 
cant” or “highly significant” unconventionally. 

The discussion of Yates’ correction for continuity would be improved by 
an explicit statement that it applies to any case of a single degree of freedom 
but does not always amount to half a unit and by deletion of the assertion 
that it always overcorrects. The analysis of variance is applied to discrete 
data in several worked examples without justification or apology. Con- 
clusions which are invalid in the sense that the adduced data cannot support 
them are drawn in several cases as (p. 65) that one variety is more prolific 
than another, the data being fruit yield in kilograms, and (p. 122) that a 
linear regression must be nonlinear between the origin and the first observed 
point. It is regrettable that calculation of the error sum of squares by differ- 
ence should receive such emphasis by repeated statement and example, and 
that the authors of the 0.1 per cent variance ratio table were not properly 
credited. 


Elements of Graduation. Norton D. Miller (Mathematician, Equitable Life As- 
surance Society of the United States, New York). A monograph sponsored by 
the Joint Committee on Actuarial Studies of the Actuarial Society of America 
and the American Institute of Actuaries. New York 1: Actuarial Society of 
America (393 Seventh Ave.), 1946. Pp. v, 79. Offset from typescript. $1.50. 


Review By T. N. E. Grevi.ie 
National Office of Vital Statistics 
U. S. Public Health Service, Washington 


ryuuis is the first of a series of monographs planned by the two actuarial 

bodies on subjects in the syllabus required for their joint examina- 
tions. The names of H. 8. Beers, C. A. Spoerl, and H. H. Wolfenden are listed 
as associate contributors. This discussion of graduation or smoothing of data 
is limited to the problems commonly encountered by the actuary and, in 
fact, is largely confined to the graduation of mortality rates. The graduation 
of time series, which would probably be the single topic of greatest interest 
to the general statistician, is not discussed. Furthermore, the scope of this 
monograph includes only such portions of the subject as fall within the syl- 
labus for Part 5 of the actuarial examinations under the title “elements of 
graduation” and not those parts which belong to “advanced graduation and 
interpolation,” covered in Part 8. The pertinent paragraph in the syllabus 
for Part 5 states that “a good fundamental understanding is required of the 
problems involved in making and testing the kinds of graduations of mor- 
tality tables or other series that the actuary is likely to meet in the course 
of his career.” Thus, this monograph is not intended to be a complete treatise 
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on graduation, or even on the graduation of mortality tables, and has not 
been written with the general reader in mind. It is designed rather to fill 
a specific need of students preparing for the examinations of the American 
actuarial bodies. 

In the opinion of the reviewer, it is highly successful in its stated purpose. 
The material which falls within its limited scope is systematically covered, 
and the ideas are so lucidly presented that it may unquestionably be read 
with profit by many outside the group for which it is specifically intended. 
The discussion, in the introductory chapter, of the nature and objectives of 
the graduation process, together with the justification for its use, is a con- 
cise but masterly presentation of a difficult subject. The customary tests of 
the acceptability of a graduation are described in the same chapter. The suc- 
ceeding five chapters deal with five particular methods of graduation: the 
graphic method, the interpolation method, the adjusted average method, 
the difference equation method, and graduation by mathematical formula. 
In general, all these methods appear to be adequately covered, but without 
such mathematical refinements as would more properly fall within the scope 
of the more advanced examination. However, the chapter on the interpola- 
tion method does not mention the very useful formulas recently developed 
by H. S. Beers (Record of the American Institute of Actuaries, Vol. 33, pp. 
245-260 and Vol. 34, pp. 14-20) and designed to secure maximum smooth- 
ness, and the chapter on the difference equation method fails to point out 
the important fact that a difference equation graduation is equivalent to 
the use of an adjusted average extended over an infinite range. 

Chapter 7 on select tables is the one which seems to the reviewer least 
adequate. While a detailed discussion of the subject is admittedly outside the 
scope of this monograph, the exposition would have been rendered much 
clearer if the successive steps in at least one method of graduating select 
tables had been outlined for illustrative purposes. Specifically, on page 55 the 
author states that “the data for duration zero and for the ultimate portion 
of the table are usualiy graduated first” but does not offer even a faint sug- 
gestion as to how the graduation of the remainder of the select table might 
proceed. 

Chapter 8 gives a particularly. valuable discussion of the comparative ad- 
vantages and disadvantages, the spheres of application, and the limitations 
ot the various graduation methods considered in the monograph; and Chap- 
ter 9 gives three sets of numerical data on which the student may try his 
hand. Lists of questions appear at the end of each chapter. 

An appendix gives, for reference, the derivation of certain formulas ap- 
pearing in the earlier chapters, with the guarantee that the candidate for 
Part 5 of the examinations will not be held responsible for this material. 
The author is especially to be commended for the inclusion in this appendix 
of a derivation of Jenkins’ fifth-difference smoothing interpolation formula 
which states correctly and completely the conditions imposed on the inter- 
polating curves. On the other hand, the conditions underlying the Karup- 
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King and Shovelton formulas are stated in unnecessarily restricted form. It 
was shown by Jenkins in 1926 (Record of the American Institute of Actuaries, 
Vol. 15, pp. 87-98) that the following are sufficient conditions for the Karup- 
King formula: (a) The interpolating curve shall pass through the pivotal 
points uz and uz,:. (b) At the point z, the interpolating curve functions for 
the adjoining intervals x —1 to x and x to x +1 shall have equal derivatives. 
(c) The interpolating curve functions shall be polynomials of minimum de- 
gree, subject to conditions (a) and (0). 

In the case of Shovelton’s formula, a sufficient set of conditions is ob- 
tained by adding the following to conditions (a) and (b) just given for the 
Karup-King formula: (c) When the six pivotal points uz 2, up, Uz, Ury1, 
Ur+2, Urs lie on the same fourth degree curve, the interpolated value at any 
point between z and x +1 shall be the corresponding ordinate of that curve. 
(d) The interpolating curve functions shall be polynomials of minimum de- 
gree, subject to conditions (a), (6), and (c). This set of conditions is believed 
to be new. In the reviewer’s opinion, the conditions given here provide a 
more complete justification of these two formulas than the somewhat arti- 
ficial conditions given in the monograph. 

The appendix also contains the derivation of formulas for correcting the 
starting values in a difference equation graduation which were developed by 
C. A. Spoerl especially for this monograph. 


Ocherki po Istorii Statistiki XVII--XVIII Vekov. (Essays on the History of Sta- 
tistics of the 17th and 18th Centuries.) M. Ptukha. Moscow: State Publishing 
House for Political Literature, 1945. Pp. 352. 


Review By G. M. Kuznets 


Associate Professor of Agricultural Economics 
University of California, Berkeley 


HE origin of modern statistics may be traced to the political arithmetic 
T of the 17th and 18th centuries and the mathematical work on proba- 
bility of about the same period. Little is to be found in this book on the basic 
work of the great mathematicians who fashioned the calculus of probability. 
A few pages are allocated to Daniel Bernoulli (his memoirs on mortality due 
to small-pox and on the mean duration of marriage) and a few more to the 
demographic work of Laplace. There are also scattered references to other 
mathematicians. However, as a history of social statistics of the period this 
book is a valuable addition to the none too extensive literature on the history 
of the “numerical method as applied to the phenomena of human society.” 

The three great pioneers of such application—Graunt, Petty, and Hal- 
ley—reccive detailed consideration in separate chapters. The eighty or so 
pages devoted to the description and evaluation of their contributions are a 
model of careful and exacting scholarship. The succeeding chapters deal in 
turn with (a) the development of political arithmetic in England, Holland, 
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France, and Germany in the first half of the 18th century (Chap. 5), (b) Swed- 
ish population statistics (Chap. 6), (c) demographic work in France from the 
middle of the 18th century to the beginning of the 19th century (Chap. 7). 
The final chapter of the book (Chap. 8) describes the development of social 
statistics in Russia to the beginning of the 19th century. This chapter also 
contains an account of the demographic work of Daniel Bernoulli with the 
following comment by the author: “D. Bernoulli was materially secure (in 
Switzerland, after leaving Russia), but nevertheless his thoughts constantly 
turned to Russia, to the development of our science and culture. That is why, 
regardless of the fact that he lived in Russia only eight years, in all justice 
we have the right to deem him our own.” 

With the exception of the chapter on Russian statistics the book covers 
thus much the same ground as the first 100 pages of Westergaard’s Contribu- 
tions to the History of Statistics and with much the same emphasis on popula- 
tion and vital statistics. The treatment, of course, is more detailed. A good 
deal of interesting tabular material from primary sources js reproduced and 
extensive quotations are given from original texts. While considerable at- 
tention is paid by the author to the economic and social conditions of the 
period, the orientation of the book is basically technical. The emphasis on 
the “mathemutical” (application of probability to population data) as dis- 
tinguished from the economic point of view was commented on forcefully 
in the Russian reviews of the book which this reviewer has seen (Bolshevik, 
February 1946 and Sovetskaia Kniga, February 1946). 


Control Charts in Factory Management. William B. Rice (Consulting Business 
Statistician, 117 East California St., Pasadena, Calif.). New York 16: John 
Wiley & Sons, Inc. (440 Fourth Ave.), 1947. Pp. ix, 149. $2.50. (London W.C. 2: 
Chapman & Hall, Ltd. (Essex St.), 1947.) T'wo reviews follow: 


REVIEW BY CHARLES D. Ferris 
Quality Control Engineer, General Electric Company, Bridgeport, Connecticut 


ITH the continued application of statistical quality control techniques 
Wi industry there has come into being a great body of practical expe- 
rience in matching effectively the formulas of the mathematical statistician 
and the variability of the manufactured product and process. This book, 
another in the Wiley Mathematical Statistics Series edited by Walter A. 
Shewhart, describes this synthesis generally as far as the organization of a 
statistical quality control program is concerned and considers specifically 
the role of the Shewhart control chart technique in the manufacturing phase 
of industry. 

Although the book is intended to stimulate the embryonic and practicing 
quality control engineer, its stated purpose is “to reach the busy practical 
men who run our factories . . . who can make most effective use of statistical 
quality control.” To this end the author describes in the introduction the 
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concept of statistical control, its axioms, and its applications throughout the 
enterprise. 

Chapters 1 through 5 are concerned with giving a concise but complete 
description of the use of control charts in the manufacturing phase of the 
business. Such a treatment properly starts with a discussion of the role of 
inspection in any such application. The development of scientific inspection 
to meet the needs of the philosophy of preventative inspection is traced. 
With the problem of controlling the process and its material there is natur- 
ally taken up the necessity of making valid predictions which in turn gives 
rise to a discussion of the degree of rational belief required and the system 
of chance causes that bring about process variability. The benefits of a con- 
trolled process and the use of statistical techniques to effect and maintain 
it most economically through small samples from rational subgroups are 
then introduced. This completes the subject matter of the first two chapters. 
Chapter 3 describes the construction and interpretation of X and # charts 
when variable measurements are available. Chapter 4 takes up the use of p 
and c charts where inspection is on an attribute basis. Chapter 5 illustrates 
the principles and techniques considered in the foregoing chapters by fifteen 
case histories of control chart practice taken from every phase of the manu- 
facturing operation. 

Chapter 6, the final chapter, describes the many factors that must be 
kept in mind in organizing a statistical quality control program to promote 
the use of the approach considered in the previous pages. 

This book is well written and straightforward. It is quite evident that the 
author has had a good deal of experience in the field of quality control and 
in using the statistical techniques he describes. From the viewpoint of a 
factory executive, it probably gives as clear an idea as any of the existing 
literature of the objectives of quality control and the manner in which it uses 
the control chart in the manufacturing process. This is particularly true of 
the introduction, Chapters 1 and 2, the case histories in Chapter 5, and those 
sections of the last chapter dealing with consulting, organization problems, 
and qualifications of quality control personnel. It is doubtful whether a 
vice president or manager of manufacturing will concern himself with the 
specific computations and interpretations of control charts or with the prob- 
lem of reporting, publicity, and sound data and statistics. That is the job of 
the quality control engineer and his associates. 

Once the executive is aware of what can be done and provides for the or- 
ganization and suitable -nthority, the details and results are in the hands of 
the quality control organization. At times the author speaks as though man- 
agement were actually setting control limits and samples sizes. For example, 
on page 86, he says, “Management decided in this case to use n = 1,000 for 
convenience in computing. ...” Certainly the only management repre- 
sented here would be the inspection, manufacturing, and quality control 
representatives following the job and not top management. Management’s 
real decisions are concerned with standards and protection. Once these are 
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set up in conjunction with quality control, control charts are a tool to see 
that they are maintained. Their maintenance and action are the responsi- 
bility of the quality control department which may delegate it to the lower 
levels of supervision. The problem of organization is often a knotty one. 
The section on the usefulness of a consultant is an excellent one, and the 
author makes a fine point when he stresses the importance of keeping the 
quality control department independent of operating department respon- 
sibilities and orders. 

This text is also written for the information and guidance of the novice 
and the experienced in the industrial application of statistical techniques. 
For these the chapters on construction of control charts and the sections on 
administration will be pertinent. 

The mechanics of construction of the X and R charts are very clearly set 
down. None of the theory underlying the limits is mentioned which fact 
will probably not interfere with the use of the techniques by factory person- 
nel. The book is unique, however, in not containing a normal curve or ex- 
planation thereof. A schematic diagram of a distribution indicating the re- 
sult of a causal system might have been helpful in this presentation. The 
fact that control limits are based on the independence of sample observations 
might have been stressed a little more explicitly. 

That part of the chapter dealing with p charts concerned with the analysis 
of a special study for assignable causes is very clear. There will be some dis- 
agreement in the section which deals with the setting of sample sizes for 
acceptance and rejection. The rule given on page 82 will necessitate unneces- 
sarily large sample sizes if it refers to the process average as might be inter- 
preted. If applied the formula g" =1 would seem more accurate and simpler 
than the one provided. The protection provided by the quality limit, L, 
on page 82 is not clearly described. I believe that when it comes to incoming 
inspection it is much easier and more precise to accept and reject in terms of 
AOQL or Lot Tolerance protection. A control chart should, of course, be 
kept as a guide. 

The examples in Chapter 5 are best where they illustrate the use of con- 
trol charts in finding assignable causes of variability in special studies. In 
the case of process control it is not necessary that incoming batches repre- 
sent controlled quality in order that sampling be used as stated on page 97, 
and the acceptance number used here, considering the break even point, 
seems unduly severe. This writer would also have liked to see more emphasis 
on the operation of the control chart by the operator with the use of very 
simple charting techniques and the reestablishment of responsibility at the 
operator level. The section closes with control chart applications to produc- 
tion figures and costs. 

One of the most instructive portions of this book, from the viewpoint of 
the quality control engineer, is the last chapter. The author provides some 
excellent advice on the problem of getting authority and proper publicity for 
quality control and presenting the results to every level of management. It 
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has been the experience of this writer that observance of these sound admin- 
istrative principles may well spell the difference between the success and 
failure of an installation. 

This book adds another chapter to the growing literature on statistical 
quality control and may be read with profit both by those who wish to know 
what it may accomplish in the manufacturing phase and by those who would 
take part in carrying out the program. 


Review BY N. Lioyp JOHNSON 
Lecturer, University College, London 


N THE preface to this book the author states that he has “striven to achieve 

both clarity and brevity.” In this he has undoubtedly succeeded. Indeed 
it may be said, without exaggeration, that the book is written in an unusually 
attractive style. It is both easy and interesting to read. Such qualities are 
rare in statistical literature and deserve recognition when they occur. How- 
ever, a book should have something important to say as well as an attractive 
way of saying it. It is in this respect that the present volume seems some- 
what lacking. It is, therefore, to be hoped that it will be read and appreciated 
by those “business executives” to whom it is “particularly dedicated.” So 
far as other classes of readers are concerned the contents of the book which 
are of most interest might have appeared as articles in periodical publica- 
tions with greater advantage. 

The book may be divided into four main sections. The introduction and 
the first two chapters describe statistical quality control in a general manner. 
Chapters 3 and 4 form an introduction to the techniques required. Chapter 
5 consists of fifteen case histories based on the author’s professional expe- 
rience. The final chapter deals with the practical and psychological problems 
involved in the initiation and maintenance of a quality control program. 

The fifth chapter is the high spot of the book. These fifteen case histories, 
discussed in a most engaging manner, provide pleasant reading and a valu- 
able factual background. They occupy rather more than one quarter of the 
total number of pages but are so varied that there is no suspicion of undue 
prolixity. These examples of the possibilities of statistical quality control 
will probably prove more impressive to the readers for whom the book is 
mainly intended than will the more general arguments of the earlier chapters. 

In the final chapter the author has much sound advice to offer on the prac- 
tical running of quality control. One dictum in particular should be ever- 
present in the minds of quality control propagandists. It is “quality control 
is not a religion.” Obvious, perhaps, but the author has felt it necessary to 
Sav so, and it is a good thing that it should be pointed out. 

It is difficult to assess the value of Chapters 3 and 4. If the book is intended 
to be an advocacy of quality control for nontechnical executives, the inclu- 
sion of technical details in the construction of various types of control charts 
seems to be uncalled for. If, on the other hand, an introductory treatise for 
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the embryo quality controller is intended, then the theoretical treatment is 
inadequate. It is also, in parts, definitely incorrect. We are, indeed, warned in 
the preface that “the mathematical statistician will see something to criti- 
cize from the viewpoint of exact statement.” Even so, page 82 comes as a 
shock. Some details will be given, so that the reader may judge whether the 
deviation from exact statement is or is not excessive. It is desired to choose 
a sample “large enough so that at least 9 times out of 10 one or more defec- 
tives will be found.” If 100f per cent be the average percentage defective, 
orthodox theory gives the formula—[logio(1 —f)|~ for the lower limit of sam- 
ple size. The book gives the formula (./2—)+-+/1—))?/p. The sample 
sizes required by the first formula are (for J fairly small) only about 40 per 
cent of those required by the second. There are other instances where the 
deviations from exact statement have been so unnecessarily great as to con- 
fuse the ideas of the reader and to be possibly actually misleading. 

When one takes into consideration the fact that the best part of the book, 
the case histories, could with equal advantage have appeared in technical 
magazines, the value of the book as a whole must remain in doubt. 


Agricultural Price Analysis, Second Edition. Geoffrey S. Shepherd (Professor of 
Economics, lowa State College). Ames, Iowa: Iowa State College Press, 1947. Pp. 
vii, 231. $3.25. 


REVIEW BY WARREN C. WalITE 
Professor of Agricultural Economics, University of Minnesota 


T Is perhaps unfair to judge this volume separately from the triad of books 
I which form the author’s contribution to the field of agricultural prices. 
The present volume is largely a revision of material contained in the author’s 
original Agricultural Price Analysis published in 1941. Some of the material 
in that earlier volume went into a book on Marketing of Farm Products, while 
some eleven chapters or more became the nucleus of this volume. The revi- 
sion represents an improvement in a number of places. Whatever faults of 
organization and selection of material the book may have, the author has 
the happy faculty of writing so that students understand him and this is an 
important attribute for a book directed at the undergraduate level. 

For the statistician, the volume will be of interest chiefly as a description 
or perhaps an exposé of the type of work done by the usual run of agricul- 
tural price analysts. There will be few technical methods that are new to him. 
The mathematical statistician will consider the methods crude, but never- 
theless they seem to have met with considerable success in many hands. The 
character of the problems and the data in the agricultural price field are 
such as to have led to extensive use of graphic and short-cut methods. This 
throws into prominence the liaison between economic theory and statistical 
analysis much more than in most other analyses of economic data. The direct- 
ness and simplicity of the methods warrant some consideration by our more 
mathematically minded friends. 
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The author is strongly oriented toward agricultural policy matters and 
the book opens and closes on this theme. The first two chapters set the gen- 
eral nature of the agricultural price problem. The widely accepted view is 
expressed that domestic demand for farm products is likely to increase less 
rapidly than their supply and when foreign demands decline our price prob- 
lems will be intensified. The long run seems to call for a definite policy ex- 
tending well beyond a simple return to competitive markets. The fallacy of 
the solution through support at some level relative to parity prices, which 
underlies much of our present policy, is made clear in the two concluding 
chapters. 

The central portion of eleven chapters is more directly concerned with 
what may be termed methods of price analysis. This portion begins with a 
chapter dealing with individual cycles of agricultural products and exten- 
sive use is made of the “cobweb theorem” in the development. Two chapters 
on demand and elasticity follow, and these are among the best of existing 
statements for beginning students. After a short discussion on supply curves, 
there are three successive chapters dealing with problems of measuring 
changes in demand and supply. These culminate in a description of the 
method of graphic multiple correlation, which has become the standard tool 
for a great deal of agricultural price analysis. The discussion of individual 
sales and cost curves is accepted economic theory, but agricultural price 
analysts find less use for this apparatus than the general theorist, in fact, 
almost none at all. When the author turns to price discrimination, the analy- 
sis again becomes more pertinent and this section is especially well done. 

The book fulfills the opening promise that “principles of economic theory 
and methods of statistical analysis are applied to the study of agricultural 
prices.” 


The Ground of Induction. Donald Williams (Associate Professor of Philosophy, 
Harvard University). Cambridge, Mass.: Harvard University Press, 1947. Pp. ix, 
213. $3.00. 


Review By C. West CoHURCHMAN 
Assistant Professor of Philosophy, University of Pennsylvania 


| rsp develops a theory of induction which follows closely and criti- 
cally the “classical” notion of probability based on the principle of in- 
sufficient reason. He argues that his theory of induction is as much “logic” 
as is the traditional theory of the syllogism, and as much rooted in man’s 
approach to his problems. As a consequence, such a basic problem as that of 
“randomness,” which so insistently besets the practicing statistician, is 
handled in a thoroughly impractical manner: “a selection is fair unless it is 
known not to be fair” (p. 72). The quality controi engineer is thus asked to 
determine his state of ignorance in finding out whether his samples are repre- 
sentative of the lot. One wonders what happens to the producer’s and con- 
sumer’s risks in this case. States of ignorance are now known to be experi- 
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mentally determinable, and the suggestion of a principle of sufficient igno- 
rance makes as thorough demands on the a priori knowledge of the practicing 
statistician as do the distributional presuppositions which Williams discards, 
Indeed, it may be far less safe to assume a state of ignorance than it is to 
make other “stochastic” assumptions demanded within the frequency theory 
of probability. 

It is to be feared that the treatment of statistical data outlined in Chapter 
4, on “The Probability of Induction,” leaves something to be desired in the 
way of rigor, though the author’s inaccuracies may be a result of a desire for 
simplification. In any case, he should have cautioned the reader against the 
universality of his rules concerning ranges involving the standard deviation. 
Again, the cumbersome derivation of a confidence interval for a true prob- 
ability of an event on the basis of a sample (pp. 100-101), and its equally 
cumbersome explanation, could have been made more accurate and at the 
same time more understandable by following the usual techniques. 

Finally, in addition to expounding an eighteenth century probability 
theory, the author now expounds an eighteenth century (Kantian) philos- 
ophy of science; he asserts that the “principles of ‘inductive methodology’ 
are as... necessary as any of the theorems of mathematical logic” (p. 124). 
The tone of this assertion leads the reader to suppose that methodological 
principles are not checked by empirical science; this is consequently a return 
to an age of psychological innocence in which man was supposed to view 
nature in his own way, but his own way of viewing nature was not investi- 
gatable by the empirical sciences. The academician can still be seen to preach 
the isolation of man’s reason from the consequences of his actions. 
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tion of Correlation Coefficients With Miss- 
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Agric 15(58) : 106-12 Ap ’47.* [1059 
Finney, D. J. Latin Squares of the Sixth 
Order. Experientia 2(10): 404-5 O 15 '46.* 
[1060 

*Finney, D. J. Probit Analysis: A Statis- 
tical Treatment of the Sigmoid Response 
Curve. London N.W.1: Cambridge Univer- 
sity Press (Bentley House, 200 Euston 
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University Press (Amen House, Warwick 
Square), 1945. Pp. viii, 408. 21s. (New York 
11: Oxford University Press [114 Fifth 
Ave.]. 1946. $6.00.*) [1104 
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Paper. Gratis.* {1118 
% National Research Council, Commit- 
tee on Applied Mathematical Statistics: 
Luther P. Eisenhart (chairman), Samuel S. 
Wilks, Chester I. Bliss, Edward U.Condon, 
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Epinephrine. J Pharmacol & Exp Thera- 
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Ind Qual Control 3(6): 14-7 My '47.* [1129 
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